[LeetCode]가장 흔한 단어

Inhwan98·2023년 1월 8일

PTU STUDY_leetcode

목록 보기

4/24

문제

금지된 단어를 제외한 가장 흔하게 등장하는 단어를 출력하라.
대소문자 구분을 하지 않으며, 구두점 (마침표, 쉼표 등) 또한 무시한다.

Example 1:

Input: paragraph = "Bob hit a ball, the hit BALL flew far after it was hit.", banned = ["hit"]

Output: "ball"

Explanation: 
"hit" occurs 3 times, but it is a banned word.
"ball" occurs twice (and no other word does), so it is the most frequent non-banned word in the paragraph. 
Note that words in the paragraph are not case sensitive,
that punctuation is ignored (even if adjacent to words, such as "ball,"), 
and that "hit" isn't the answer even though it occurs more because it is banned.

Example 2:

Input: paragraph = "a.", banned = []
Output: "a"

코드

class Solution {
public:
    string mostCommonWord(string paragraph, vector<string>& banned) {
    string result;
	multiset <string> ms;

	int temp = 0;
	int len = 0;
    
	paragraph.append(" ");

	int start = 0, end = 0;
	for (int i = 0; i < paragraph.size(); i++)
	{
		paragraph[i] = tolower(paragraph[i]);
		if (isalpha(paragraph[i]) == false)
		{
			end = i;
			len = end - start;
			string a = paragraph.substr(start, len);
			if (isalpha(a[0]) != 0) ms.insert(a);
			start = end + 1;
		}
	}

	for (string a : banned) ms.erase(a);

	multiset<string>::iterator iter;
	int MaxCount = ms.count(*(ms.begin()));
	for (iter = ms.begin(); iter != ms.end(); iter++)
	{
		if (MaxCount <= ms.count(*iter))
		{
			MaxCount = ms.count(*iter);
			result = *iter;
		}
	}

	return result;

    }
};

풀이

1.

multiset <string> ms;

단어들을 추출해 multiset 변수에 넣어주면 각 단어들이 순서대로 정렬되어 들어갈것이다. 최종적으로 multiset의 count기능을 이용하여 제일 많이 나오는 단어를 탐색 할 것이다.

2.

paragraph.append(" ");

문장 속 단어가 끝나는 후 string안에 단어를 분리 하는데,
마지막 문장에는 공백이 없기때문에 별도로 추가 해준다.

3.

	int start = 0, end = 0;
	for (int i = 0; i < paragraph.size(); i++)
	{
        //1.
		paragraph[i] = tolower(paragraph[i]);
        //2.
		if (isalpha(paragraph[i]) == false)
		{
            //3.
			end = i;
			len = end - start;
			string a = paragraph.substr(start, len);
            //4.
			if (isalpha(a[0]) != 0) ms.insert(a);
			start = end + 1;
		}
	}

paragraph를 전부 소문자로 바꿔준다.
paragraph[i]단어가 알파벳이 아니라면...
현재 인덱스를 끝 번호인 end에 대입 한다. 알파벳을 제외한 단어가 나오기 까지의 거리를 알기 위해 end - start를 하여 길이인 len을 구한다. substr을 이용하여 start지점부터 len길이를 a에 넣어줌으로써 한 단어가 들어 간 것을 확인 할 수 있다.
a에 들어간 단어가 영어가 맞다면 ms에 a를 insert하여 준다. 그리고 다시 start지점은 end의 +1 지점으로 바꿔준다. 즉 영어를 제외한 단어에서 다음 인덱스이기 때문에 영어단어 지점부터 start는 시작인 것이다.

4.

for (string a : banned) ms.erase(a);

banned에 있는 단어들을 ms에서 지운다.

5.

    //1.
    multiset<string>::iterator iter;
    //2.
	int MaxCount = ms.count(*(ms.begin()));
	for (iter = ms.begin(); iter != ms.end(); iter++)
	{
  		//3.
		if (MaxCount <= ms.count(*iter))
		{
			MaxCount = ms.count(*iter);
			result = *iter;
		}
	}