Codility GenomicRangeQuery

멍진이·2022년 10월 30일
0

문제

A DNA sequence can be represented as a string consisting of the letters A, C, G and T, which correspond to the types of successive nucleotides in the sequence. Each nucleotide has an impact factor, which is an integer. Nucleotides of types A, C, G and T have impact factors of 1, 2, 3 and 4, respectively. You are going to answer several queries of the form: What is the minimal impact factor of nucleotides contained in a particular part of the given DNA sequence?

The DNA sequence is given as a non-empty string S = S[0]S[1]...S[N-1] consisting of N characters. There are M queries, which are given in non-empty arrays P and Q, each consisting of M integers. The K-th query (0 ≤ K < M) requires you to find the minimal impact factor of nucleotides contained in the DNA sequence between positions P[K] and QK.

For example, consider string S = CAGCCTA and arrays P, Q such that:

P[0] = 2    Q[0] = 4
P[1] = 5    Q[1] = 5
P[2] = 0    Q[2] = 6

The answers to these M = 3 queries are as follows:

The part of the DNA between positions 2 and 4 contains nucleotides G and C (twice), whose impact factors are 3 and 2 respectively, so the answer is 2.
The part between positions 5 and 5 contains a single nucleotide T, whose impact factor is 4, so the answer is 4.
The part between positions 0 and 6 (the whole string) contains all nucleotides, in particular nucleotide A whose impact factor is 1, so the answer is 1.
Write a function:

def solution(S, P, Q)

that, given a non-empty string S consisting of N characters and two non-empty arrays P and Q consisting of M integers, returns an array consisting of M integers specifying the consecutive answers to all queries.

Result array should be returned as an array of integers.

For example, given the string S = CAGCCTA and arrays P, Q such that:

P[0] = 2    Q[0] = 4
P[1] = 5    Q[1] = 5
P[2] = 0    Q[2] = 6

the function should return the values [2, 4, 1], as explained above.

Write an efficient algorithm for the following assumptions:

N is an integer within the range [1..100,000];
M is an integer within the range [1..50,000];
each element of arrays P and Q is an integer within the range [0..N - 1];
P[K] ≤ Q[K], where 0 ≤ K < M;
string S consists only of upper-case English letters A, C, G, T.

처음 접근

  • 전체 문자열 배열을 숫자로 치환하고 매번 min 값을 구하도록 함
  • 시간 복잡도 초과

최종 접근

def solution(S, P, Q):
    # write your code in Python 3.8.10
    listA=[0]
    listC = [0]
    listG = [0]

    idxA = 0
    idxC = 0
    idxG = 0 

    for ch in S:
        if ch == "A":
            idxA +=1
            listA.append(idxA)
            listC.append(idxC)
            listG.append(idxG)
        elif ch == "C":
            idxC +=1
            listA.append(idxA)
            listC.append(idxC)
            listG.append(idxG)
        elif ch == "G":
            idxG +=1
            listA.append(idxA)
            listC.append(idxC)
            listG.append(idxG)
        else:
            listA.append(idxA)
            listC.append(idxC)
            listG.append(idxG)

    #print(listA,listC,listG)

    total = []
    for i in range(len(P)):
        left = P[i]
        right = Q[i]+1
        num = 4
        if listA[right] - listA[left] >0:
            num =1
        elif listC[right] -listC[left]>0:
            num =2
        elif listG[right] - listG[left]>0:
            num =3
        total.append(num)
    
    return total
  • A,C,G 에 대한 리스트를 만든다.
  • 문자열에서 해당하는 문자 값이 나오면 list에 값을 추가해서 더한다.
  • 최종적으로 검사할때는 해당 범위내에서 변화가 있는지만 감지하면 된다.
  • 변화가 없으면 default인 4
  • 시간 복잡도는 O(m*n)
profile
개발하는 멍멍이

0개의 댓글