30. Substring with Concatenation of All Words

LONGNEW·2023년 7월 18일

2023.07.18 딕셔너리 맵 투포인터 해시

CP

목록 보기

127/155

https://leetcode.com/problems/substring-with-concatenation-of-all-words/?envType=featured-list&envId=top-google-questions

input :

s, words

output :

s의 부분문자열이 words의 모든 원소를 담고 있는 경우에 해당 idx를 반환하시오.

조건 :

words의 모든 원소의 길이는 동일하다.
words의 원소들의 순열의 형태가 부분문자열이어야 한다.

Solution explain : Solution1

idea

dict를 사용하여 탐색의 시간 복잡도를 줄이자.
우선적으로 words의 원소 개수를 카운팅한다.
투포인터의 시작지점(start)은 0 ~ len(words[0])까지이다. => 등장할 수 있는 부분문자열의 모든 경우의 수를 따지기 위해 0, 1, 2, ... , len(words[0]) 까지만 돌게한다.
모든 투포인터 시작전에는 temp_data를 초기화하는 과정이 필요하다.
반복문의 종료 조건은 end가 s의 끝에 다다랐을 때이다.
현재 찾은 단어가 words에 없을 때, start 포인터를 가져오며 remove_word를 찾아 개수를 뺀다.
현재 찾은 단어가 data의 개수와 동일하다면 이거보다 작아질 때 까지 반복문을 수행한다. 하면서 start 포인터를 땅긴다.
반복문의 마지막에는 temp_cnt가 word_cnt와 동일한지 확인하는 코드를 통해 정답을 추가한다.

주의

del와 같이 삭제를 하면서 진행을 한다면 이후에 반복문에서 다시 제작을 해야 하니 2개의 딕셔너리를 비교하는 것이 좋다.
투포인터 형식을 생각하는 것이 시간 복잡도에서 이득을 볼 수 있다. => 어차피 반복문을 돌리나 start 포인터를 땡기나 결과물로 얻을 words의 원소 카운팅은 동일하니 포인터를 사용해야 한다.
단어가 words에 없는 경우, 현재까지 등장한 단어가 words보다 많은 경우에 start 포인터를 당겨줘야 한다. 그리고 초기화 하는 과정도 필요하다.

from typing import List


class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        ret = []
        word_len = len(words[0])
        word_cnt = len(words)

        data, temp_data = dict(), dict()
        for item in words:
            temp_data[item] = 0
            if item not in data:
                data[item] = 1
            else:
                data[item] += 1

        for idx in range(word_len):
            start, end = idx, idx + word_len
            temp_cnt = 0

            for key in temp_data.keys():
                temp_data[key] = 0

            while end <= len(s):
                temp_word = s[end - word_len:end]
                if temp_word not in data:
                    while start < end:
                        remove_word = s[start:start + word_len]
                        if remove_word in temp_data:
                            temp_data[remove_word] -= 1
                            temp_cnt -= 1
                        start += word_len
                    end += word_len
                    continue

                while temp_data[temp_word] >= data[temp_word]:
                    remove_word = s[start:start + word_len]
                    if remove_word in temp_data:
                        temp_data[remove_word] -= 1
                        temp_cnt -= 1
                    start += word_len

                temp_data[temp_word] += 1
                temp_cnt += 1
                end += word_len

                if temp_cnt == word_cnt:
                    ret.append(start)

        return ret

s = Solution()
print(s.findSubstring("aaaaaaaaaaaaaa", ["aa","aa"]))
print(s.findSubstring("a", ["a"]))
print(s.findSubstring("barfoofoobarthefoobarman", ["bar","foo","the"]))
print(s.findSubstring("wordgoodgoodgoodbestword", ["word","good","best","word"]))
print(s.findSubstring("barfoothefoobarman", ["foo","bar"]))