[LeetCode] 819. Most Common Word

Jadon·2021년 12월 29일
0
post-thumbnail

Given a string paragraph and a string array of the banned words banned, return the most frequent word that is not banned. It is guaranteed there is at least one word that is not banned, and that the answer is unique.

The words in paragraph are case-insensitive and the answer should be returned in lowercase.

Example 1:

Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]
Output: "ball"
Explanation: 
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. 
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"), 
and that "hit" isn't the answer even though it occurs more because it is banned.

Example 2:

Input: paragraph = "a.", banned = []
Output: "a"

Constraints:

- 1 <= paragraph.length <= 1000
- paragraph consists of English letters, space ' ', or one of the symbols: "!?',;.".
- 0 <= banned.length <= 100
- 1 <= banned[i].length <= 10
- banned[i] consists of only lowercase English letters.

My Solution

class Solution:
    def mostCommonWord(self, paragraph: str, banned: List[str]) -> str:
        words = [word for word in re.sub(r'[^\w]', ' ', paragraph)
                .lower().split()
                if word not in banned]
        counts = collections.Counter(words)
        
        return counts.most_common(1)[0][0]

Regular Expression

  • re.sub(정규표현식, 치환문자, 대상문자열)
  • r'': raw string을 의미. 역슬래시가 문자 그대로 나올 수 있게 해줌.
  • r'[^\w]': \w는 단어 문자(Word Character)를 뜻하며, ^는 not을 의미한다. 따라서 본 정규식은 단어 문자가 아닌 모든 문자를 공백으로 치환(Substitute)하는 역할을 한다.

Counter

  • list words의 각 문자열의 개수를 세어, dict 형태로 저장.
  • e.g. {'apple': 3, 'banana': 2, 'dog': 1}

0개의 댓글