[๐Ÿ””TIL] ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•˜๋ ค๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ• ๊นŒ?

์ดํ•˜์–€ยท2024๋…„ 8์›” 13์ผ
0

๐Ÿƒ๐Ÿปโ€โ™€๏ธ Weekly Article

๋ชฉ๋ก ๋ณด๊ธฐ
1/2

๋ณธ ๋‚ด์šฉ์˜ ์ถœ์ฒ˜๋Š” 262๊ฐ€์ง€ ๋ฌธ์ œ๋กœ ์ •๋ณตํ•˜๋Š” ์ฝ”๋”ฉ ์ธํ„ฐ๋ทฐ in Java์— ์žˆ์œผ๋ฉฐ, ์ถ”๊ฐ€ํ•œ ๋‚ด์šฉ์ด ์žˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


Day1


โ˜๏ธ ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด์ž.

๋จผ์ €, ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ ์‹œ์Šคํ…œ์„ ์„ค๊ณ„ํ•˜๋ ค๋ฉด ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•œ๋‹ค.

๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ?

  • ์ฒ ์ž๊ฐ€ ํ‹€๋ฆฐ ๋ฌธ์ž์—ด์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ๊ธฐ๋Š” ์‚ฌ์ „์—์„œ ํ•ด๋‹น ๋ฌธ์ž์—ด๊ณผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋‹จ์–ด๋ฅผ ๋ฐ˜ํ™˜
  • ๋ฌธ์ž์—ด์ด ์–ผ๋งˆ๋‚˜ ๋น„์Šทํ•œ์ง€๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ, ํŽธ์ง‘ ๊ฑฐ๋ฆฌ๋ผ๊ณ ๋„ ๋ถ€๋ฆ„.
  • ๋‘ ๋‹จ์–ด์˜ ๊ฑฐ๋ฆฌ = ์ตœ์†Œ ํŽธ์ง‘ ํšŸ์ˆ˜
    • ์ฒ ์ž๊ฐ€ ํ‹€๋ฆฐ ๋‹จ์–ด์—์„œ ์ฒ ์ž๊ฐ€ ์˜ฌ๋ฐ”๋ฅธ ๋‹จ์–ด๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ํšŸ์ˆ˜

ํŽธ์ง‘?

  • ๋ฌธ์ž์˜ ์‚ฝ์ž…, ์‚ญ์ œ, ์น˜ํ™˜์„ ์ผ์ปฌ์Œ.
  • ex) "Saturday"์—์„œ "Sundays"๋กœ ๋ณ€ํ™˜
    • ์ฒซ ๋ฒˆ์งธ 'a'์™€ 't'๋ฅผ ์‚ญ์ œ, 'r'์„ 'n'์œผ๋กœ ์น˜ํ™˜, ๋งˆ์ง€๋ง‰์— 's'๋ฅผ ์‚ฝ์ž…ํ•ด์•ผ ํ•จ.
    • ์ฆ‰, ์ด ๋‘˜ ์‚ฌ์ด์˜ ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ๋Š” 4๊ฐ€ ๋จ.


Day2


โ˜๏ธ ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ ๊ตฌํ•ด๋ณด๊ธฐ

๐Ÿ”Ž ๋ฌธ์ž์—ด 2๊ฐœ๊ฐ€ ์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์ฒซ ๋ฒˆ์งธ ๋ฌธ์ž์—ด์—์„œ ๋‘ ๋ฒˆ์งธ ๋ฌธ์ž์—ด๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐ ํ•„์š”ํ•œ ์ตœ์†Œ ํŽธ์ง‘ ํšŸ์ˆ˜๋ฅผ ๊ตฌํ•˜๋ผ.

Hint : ๋ฒ”์œ„๋ฅผ ์ขํ˜€ ๋‘ ๋ฌธ์ž์—ด์˜ ์ ‘๋‘์‚ฌ๋ฅผ ๋Œ€์ƒ์œผ๋กœ ๋™์ผํ•œ ๋ฌธ์ œ ํ’€์–ด๋ณด๊ธฐ

  • Solution 01
    • ๋‘ ๋ฒˆ์งธ ๋ฌธ์ž์—ด๊ณผ ๊ฐ™์€ ๋ฌธ์ž์—ด์„ ์ฐพ์„ ๋•Œ๊นŒ์ง€ ์ฒซ ๋ฒˆ์งธ ๋ฌธ์ž์—ด๊ณผ ๊ฑฐ๋ฆฌ๊ฐ€ 1, 2, 3,..์ธ ๋ชจ๋“  ๋ฌธ์ž์—ด์„ ๋‚˜์—ด
    • ๋‚˜์—ดํ•ด์•ผ ํ•  ๋ฌธ์ž์—ด์˜ ๊ฐœ์ˆ˜๊ฐ€ ๋ฐฉ๋Œ€ํ•ด์ง€๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Œ.
    • ex) ์ฒซ ๋ฒˆ์งธ ๋ฌธ์ž์—ด์˜ ๊ธธ์ด๊ฐ€ n์ธ 0์ด๊ณ  ๋‘ ๋ฒˆ์งธ ๋ฌธ์ž์—ด์ด ๊ธธ์ด๊ฐ€ n์ธ 1์ด๋ผ๊ณ  ํ–ˆ์„ ๋•Œ, ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ 2n2^n๊ฐœ์˜ ๋ฌธ์ž์—ด์„ ์‚ดํŽด๋ด์•ผ ํ•จ.

  • Solution 02

    • ํƒ์ƒ‰ ๋„์ค‘ '๊ฐ€์ง€ ์น˜๋Š”' ๋ฐฉ๋ฒ•
    • ์ฒซ ๋ฒˆ์งธ ๋ฌธ์ž์—ด์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž๊ฐ€ ๋‘ ๋ฒˆ์งธ ๋ฌธ์ž์—ด์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž์™€ ๊ฐ™๋‹ค๋ฉด, ์ด ๋ฌธ์ž๋ฅผ ๋ฌด์‹œํ•ด๋„ ๋จ.
    • ๋‹ค๋ฅด๋‹ค๋ฉด, ๊ธฐ์กด ๋ฌธ์ž์—ด์— ์ดˆ์ ์„ ๋งž์ถฐ ์ตœ์ข… ํŽธ์ง‘์„ ์ˆ˜ํ–‰ํ•˜๋ฉด ๋จ.
    • ์ตœ์ข… ํŽธ์ง‘์€ ์‚ฝ์ž…, ์‚ญ์ œ, ์น˜ํ™˜ ์ค‘์˜ ํ•˜๋‚˜๊ฐ€ ๋จ.

  • a์™€ b๊ฐ€ ๊ฐ๊ฐ ๋ฌธ์ž์—ด A์™€ B์˜ ๊ธธ์ด๋ผ๊ณ  ํ–ˆ์„ ๋•Œ์˜ ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ

    • E(A[0,aโˆ’1],B[0,bโˆ’1])E(A[0, a - 1], B[0, b - 1])

  • ์—ฌ๊ธฐ์„œ ์•Œ ์ˆ˜ ์žˆ๋Š” ์‚ฌ์‹ค

    • A์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž๊ฐ€ B์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž์™€ ๊ฐ™๋‹ค๋ฉด

      • E(A[0,aโˆ’1],B[0,bโˆ’1])=E(A[0,aโˆ’2],B[0,bโˆ’2])E(A[0, a - 1], B[0, b - 1]) = E(A[0, a - 2], B[0, b - 2])
    • A์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž๊ฐ€ B์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž์™€ ๋‹ค๋ฅด๋‹ค๋ฉด
      E(A[0,aโˆ’1],B[0,bโˆ’1])=1+min(E(A[0,aโˆ’2],B[0,bโˆ’2]),E(A[0,aโˆ’1],B[0,bโˆ’2]),E(A[0,aโˆ’2],B[0,bโˆ’1],)E(A[0, a - 1], B[0, b - 1]) = 1 + min( E(A[0, a - 2], B[0, b - 2]), E(A[0, a - 1], B[0, b - 2]), E(A[0, a - 2], B[0, b - 1], )


  • A์—์„œ B๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ• 3๊ฐ€์ง€(A[0,aโˆ’1]A[0, a - 1]์—์„œ B[0,bโˆ’1]B[0, b - 1]๋กœ ๋ณ€ํ™˜)
    • A[0,aโˆ’2]A[0, a - 2]์—์„œ B[0,bโˆ’2]B[0, b - 2]๋กœ ๋ณ€ํ™˜ํ•œ ๋’ค์—, A์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž๋ฅผ B์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž๋กœ ์น˜ํ™˜
    • A[0,aโˆ’1]A[0, a - 1]์—์„œ B[0,bโˆ’2]B[0, b -2]๋กœ ๋ณ€ํ™˜ํ•œ ๋’ค์—, B์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ œ๋ฅผ ๋’ค์— ์ถ”๊ฐ€
    • A[0,aโˆ’2]A[0, a - 2]์—์„œ B[0,bโˆ’1]B[0, b - 1]๋กœ ๋ณ€ํ™˜ํ•œ ๋’ค์—, A์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž๋ฅผ ์‚ญ์ œ

  • ์ฆ๋ช…
  • ์ž„์˜์˜ ์ตœ์ ํ•ด๋ฅผ ์„ ํƒํ•œ ๋’ค, ๊ทธ ์ˆœ์„œ๋ฅผ ์žฌ๋ฐฐ์น˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๊ธฐ๋ฐ˜์„ ๋‘๊ณ  ์žˆ์Œ.
  • ๋™์  ํ”„๋กœ๊ทธ๋ž˜๋ฐ์œผ๋กœ ํ•ด๊ฒฐ -> E(A[0,aโˆ’1],B[0,bโˆ’1])E(A[0, a - 1], B[0, b - 1])์„ ๊ตฌํ•˜๋Š” ์ค‘๊ฐ„ ๊ฒฐ๊ณผ๋ฅผ ์บ์‹œ์— ์ €์žฅ

  • ์‹ค์ œ ์˜ˆ์‹œ ํ™•์ธํ•˜๊ธฐ
    • ๋‹จ์–ด "carthorse"์™€ "Orchestra"์˜ ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ(E)
    • ํ’€์ด : ๋Œ€๋ฌธ์ž์™€ ์†Œ๋ฌธ์ž๋Š” ๋‹ค๋ฅธ ๋ฌธ์ž๋กœ ์ทจ๊ธ‰, ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ๋Š” 8

  • ์ฝ”๋“œ๋กœ ํ™•์ธํ•˜๊ธฐ

    • k < a, l < b์— ๋Œ€ํ•ด E(A[0.k],B[0,l])E(A[0.k], B[0, l])์„ ์ด๋ฏธ ์•Œ๊ณ  ์žˆ๋‹ค๋ฉด -> E(A[0,aโˆ’1],B[0,bโˆ’1])E(A[0, a - 1], B[0, b - 1])์„ ๊ตฌํ•˜๋Š” ์‹œ๊ฐ„์€ O(1)
    • ์ฆ‰, ํ•ด๋‹น ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ „์ฒด ์‹œ๊ฐ„ ๋ณต์žก๋„ ๋ฐ ๊ณต๊ฐ„ ๋ณต์žก๋„ = O(ab)O(ab)
    public static int levenshteinDistance(String A, String B) {
        int[][] distanceBetweenPrefixes = new int[A.length()][B.length()];
        for(int[] row : distanceBetweenPrefixes) {
            Arrays.fill(row, -1);
        }
        return computeDistanceBetweenPrefixes(A, A.length() - 1, B, B.length() - 1, distanceBetweenPrefixes);
    }
    
    private static int computeDistanceBetweenPrefixes(String A, int A_idx, String B, int B_idx, int[][] distanceBetweenPrefixes) {
    
        if(A_idx < 0) {
            // A๊ฐ€ ๋น„์–ด์žˆ์œผ๋ฏ€๋กœ B์˜ ๋ชจ๋“  ๋ฌธ์ž๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค.
            return B_idx + 1;
        } else if (B_idx < 0) {
            //B๊ฐ€ ๋น„์–ด์žˆ์œผ๋ฏ€๋กœ A์˜ ๋ชจ๋“  ๋ฌธ์ž๋ฅผ ์‚ญ์ œํ•œ๋‹ค.
            return A_idx + 1;
        }
    
        if(distanceBetweenPrefixes[A_idx][B_idx] == -1) {
            if(A.charAt(A_idx) == B.charAt(B_idx)) {
                distanceBetweenPrefixes[A_idx][B_idx] = computeDistanceBetweenPrefixes(A, A_idx - 1, B, B_idx - 1, distanceBetweenPrefixes);
            }
    
            else {
                int substituteLast = computDistanceBetweenPrefixes(A, A_idx - 1, B, B_idx - 1, distanceBetweenPrefixes);
    
                int addLast = computDistanceBetweenPrefixes(A, A_idx - 1, B, B_idx, distanceBetweenPrefixes);
    
                distanceBetweenPrefixes[A_idx][B_idx] = 1 + Math.min(substituteLast, Math.min(addList, deleteLast));
            }
        }
        return distanceBetweenPrefixes[A_idx][B_idx];
    }

๐Ÿ˜Ž ์‘์šฉ 5๋ฌธ์ œ
1. O(min(a,b)O(min(a,b) ๊ณต๊ฐ„ ๋ณต์žก๋„์™€ O(ab)O(ab) ์‹œ๊ฐ„ ๋ณต์žก๋„์˜ ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ๋ฅผ ๊ตฌํ•˜๋ผ.
2. A์™€ B๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ๊ฐ€์žฅ ๊ธด A์™€ B์˜ ๋ถ€๋ถ„ ์ˆ˜์—ด์„ ๊ตฌํ•˜๋ผ. ์˜ˆ๋ฅผ ๋“ค์–ด ์œ„ ๊ทธ๋ฆผ(ํ‘œ)์— ๋‚˜์˜จ ๋ฌธ์ž์—ด์—์„œ ๊ฐ€์žฅ ๊ธด ๋ถ€๋ถ„ ์ˆ˜์—ด์€ <r,h,s><r, h, s>์ด๋‹ค.
3. ๋ฌธ์ž์—ด A๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ํšŒ๋ฌธ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์‚ญ์ œํ•ด์•ผ ํ•˜๋Š” ์ตœ์†Œ ๋ฌธ์ž์˜ ๊ฐœ์ˆ˜๋ฅผ ๊ตฌํ•˜๋ผ.
4. ๋ฌธ์ž์—ด A์™€ ์ •๊ทœํ‘œํ˜„์‹ rr์ด ์ฃผ์–ด์กŒ์„ ๋•Œ, rr๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ž์—ด ์ค‘์—์„œ A์™€ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋ฌธ์ž์—ด์€ ๋ฌด์—‡์ธ๊ฐ€? ๋ฌธ์ž์—ด ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋Š” ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
5. ๋ฌธ์ž์—ด tt๋Š” ๋ฌธ์ž์—ด s1s_1๊ณผ S2S_2๋ฅผ ์™ผ์ชฝ์—์„œ ์˜ค๋ฅธ์ชฝ ์ˆœ์„œ๋Œ€๋กœ ๋ฒˆ๊ฐˆ์•„ ๋ฐฐ์น˜์‹œ์ผœ์„œ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š” ๋ฌธ์ž์—ด์„ ๋งํ•œ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด, s1s_1์ด "gtaa"์ด๊ณ , s2s_2๊ฐ€ "atc"๋ผ๊ณ  ํ–ˆ์„ ๋•Œ, "gattaca"์™€ "gtataac"๋Š” s1s_1๊ณผ s2s_2๋ฅผ ๋ฒˆ๊ฐˆ์•„ ๋ฐฐ์น˜์‹œ์ผœ์„œ ๋งŒ๋“ค ์ˆ˜ ์žˆ์ง€๋งŒ, "gatacta"๋Š” ๋งŒ๋“ค ์ˆ˜ ์—†๋‹ค.
๋ฌธ์ž์—ด s1s_1, s2s_2, tt๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, tt๊ฐ€ ๋ฌธ์ž์—ด s1s_1, s2s_2๋ฅผ ๋ฒˆ๊ฐˆ์•„ ๋ฐฐ์น˜์‹œ์ผœ์„œ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ค๊ณ„ํ•˜๋ผ.



Day3


โ˜๏ธ ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ ์„ค๊ณ„

  • ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ๊ธฐ์—์„œ๋Š” ์ „์ฒด ์‚ฌ์ „์—์„œ ์ฃผ์–ด์ง„ ๋ฌธ์ž์—ด๊ณผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋‹จ์–ด์˜ ์ง‘ํ•ฉ์„ ๋ฐ˜๋“œ์‹œ ์ฐพ์•„์•ผ ํ•จ.
  • ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ๋Š” ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ๊ธฐ์— ์ ํ•ฉํ•œ ๊ฑฐ๋ฆฌ ์ธก์ • ํ•จ์ˆ˜๊ฐ€ ์•„๋‹ ์ˆ˜๋„ ์žˆ์Œ.
  • ํ”ํ•œ ์ฒ ์ž ์˜ค๋ฅ˜ ํ˜น์€ ํ‚ค๋ณด๋“œ ์œ„์—์„œ์˜ ๊ฐ€๊นŒ์šด ๋ฌธ์ž์˜ ์ƒํ™ฉ์„ ๊ณ ๋ คํ•˜์ง€ ์•Š์Œ.

๐Ÿ”Ž ์ด๋Ÿฌํ•œ ์กฐ๊ฑด์„ ๊ณ ๋ คํ•  ๋•Œ, ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ ์‹œ์Šคํ…œ์€ ์–ด๋–ป๊ฒŒ ๋งŒ๋“ค ๊ฒƒ์ธ๊ฐ€?

Hint : ๋‘ ๋‹จ์–ด ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ์— ๋Œ€ํ•œ ๊ฐœ๋…์„ ๋จผ์ € ์ •์˜ํ•˜๊ธฐ

โœจ IDEA : ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ ์‹œ์Šคํ…œ์€ ๋Œ€๋ถ€๋ถ„ ์ฒ ์ž ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•œ ๋‹จ์–ด์™€ ์˜๋„ํ•œ ๋‹จ์–ด ์‚ฌ์ด์˜ ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ๊ฐ€ ์ž‘์€(1 or 2๊ฐœ ์ฐจ์ด) ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค.

  • ํ•ด์‹œ ํ…Œ์ด๋ธ”์— ์‚ฌ์ „ ์•ˆ์— ์žˆ๋Š” ๋ชจ๋“  ๋‹จ์–ด ๋„ฃ๊ธฐ + ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ๊ฐ€ 2์ธ ๋‹จ์–ด๋งŒ ํƒ์ƒ‰ -> ๋†’์€ ํ™•๋ฅ ๋กœ ์˜๋„ํ•œ ๋‹จ์–ด๋ฅผ ์ฐพ์„ ์ˆ˜ ์žˆ์Œ.
    • ํ•„์š”ํ•œ ํ•ด์‹œ ํ…Œ์ด๋ธ” ํƒ์ƒ‰ ํšŸ์ˆ˜ : ์ฃผ์–ด์ง„ ์•ŒํŒŒ๋ฒณ ๊ฐœ์ˆ˜๊ฐ€ m์ด๊ณ , ํƒ์ƒ‰ํ•  ๋ฌธ์ž์—ด์˜ ๊ธธ์ด๊ฐ€ n์ด๋ผ๋ฉด O(n2m2)O(n^2m^2)

  • ์ฃผ์–ด์ง„ ๋ฌธ์ž์—ด ๊ธธ์ด๊ฐ€ n์ด๋ผ๋ฉด -> ์ž„์˜์˜ ๋ฌธ์ž ๋‘ ๊ฐœ๋ฅผ ์„ ํƒํ•œ ๋’ค ๊ฐ ๋ฌธ์ž๋ฅผ ์ž„์˜์˜ ๋‹ค๋ฅธ ์•ŒํŒŒ๋ฒณ์œผ๋กœ ์น˜ํ™˜ํ•ด๋ณด๊ธฐ
    • ์ž„์˜์˜ ๋‘ ๋ฌธ์ž๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•์˜ ๊ฐœ์ˆ˜ : ์ด n(nโˆ’1)/2n(n-1)/2์ด๋ฉฐ, ๊ฐ ๋ฌธ์ž๋ฅผ (mโˆ’1)(m - 1)๊ฐœ์˜ ๋‹ค๋ฅธ ๋ฌธ์ž๋กœ ๋ฐ”๊พธ๋Š” ์‹œ๋„๋ฅผ ํ•ด์•ผํ•˜๋ฏ€๋กœ ์ด n(nโˆ’1)(mโˆ’1)2/2n(n-1)(m-1)^2 / 2 ๋งŒํผ ํ•ด์‹œ ํ…Œ์ด๋ธ” ํƒ์ƒ‰

  • ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ๊ฐ€ 2 ์ดํ•˜์ธ ๋ชจ๋“  ๋ฌธ์ž์—ด ์ง‘ํ•ฉ๊ณผ ์‚ฌ์ „ ๋‚ด์˜ ๋ชจ๋“  ๋ฌธ์ž์—ด ์ง‘ํ•ฉ ์‚ฌ์ด์˜ ๊ต์ง‘ํ•ฉ์€ ๋งค์šฐ ํด ์ˆ˜ ์žˆ์Œ.
  • ๊ฐ ํ›„๋ณด ๋ฌธ์ž์—ด์— ๋žญํ‚น์„ ๋งค๊ธฐ๊ณ , ๊ฐ€์žฅ ๊ทธ๋Ÿด๋“ฏํ•œ ๋ฌธ์ž์—ด๋ถ€ํ„ฐ ์ฐจ๋ก€๋Œ€๋กœ ๋ณด์—ฌ์ฃผ๋Š” ๊ฒƒ์ด ์ข‹์Œ.

1๏ธโƒฃ ํƒ€์ดํ•‘ ์˜ค๋ฅ˜ ๋ชจ๋ธ

  • ํƒ€์ดํ•‘ ์˜ค๋ฅ˜๋Š” ํ‚ค๋ณด๋“œ ์žํŒ ๋ฐฐ์—ด์— ๋”ฐ๋ผ ๋ชจ๋ธ๋ง ๊ฐ€๋Šฅ

2๏ธโƒฃ ์Œ์„ฑ ๋ชจ๋ธ๋ง

  • ๋‹จ์–ด๋ฅผ ๋“ค์„ ๋•Œ์™€ ์‹ค์ œ ๋งž์ถค๋ฒ• ์‚ฌ์ด์˜ ์ฐจ์ด์—์„œ ์˜ค๋Š” ๊ฒฝ์šฐ์— ํ•ด๋‹น
  • ๋ฌธ์žฅ์˜ ์Œ์†Œ์™€ ๊ฐ™์€ ์Œ์„ฑ์˜ ๋‹จ์–ด ๋ฆฌ์ŠคํŠธ๋ฅผ ๋งคํ•‘ํ•˜๊ธฐ

3๏ธโƒฃ ์ˆ˜์ •ํ–ˆ๋˜ ๊ธฐ๋ก

  • ์‚ฌ์šฉ์ž๋“ค์˜ ๊ฒฝ์šฐ, ์ž˜๋ชป ์ž…๋ ฅํ•œ ๋’ค ์ˆ˜์ •ํ•˜๋Š” ๊ฒฝ์šฐ๋„ ๋งŽ์Œ.
  • ์ด๋ฅผ ๋ชจ์•„์„œ -> ๊ฐ€์žฅ ๋งŽ์ด ํ‹€๋ฆฌ๋Š” ์ฒ ์ž์— ๋Œ€ํ•œ ๋ฐฉ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ.

4๏ธโƒฃ stemming(์Šคํ…Œ๋ฐ)

  • ๊ฐ ๋‹จ์–ด๋ฅผ ํ˜•ํƒœ์†Œ๋ณ„๋กœ ์ €์žฅ -> ์‚ฌ์ „ ํฌ๊ธฐ ์ค„์ผ ์ˆ˜ ์žˆ์Œ.
  • ์ด ๊ฒฝ์šฐ, ์ž…๋ ฅ์œผ๋กœ ๋“ค์–ด์˜จ ๋ฌธ์žฅ์—๋„ ์Šคํ…Œ๋ฐ์„ ์ ์šฉํ•ด์•ผ ํ•จ.


Day4-5


โ˜๏ธ ์Šคํ…Œ๋ฐ ๋ฌธ์ œ์˜ ํ•ด๋ฒ• ์„ค๊ณ„ํ•˜๊ธฐ

์Šคํ…Œ๋ฐ์ด๋ž€?

  • ์ฃผ์–ด์ง„ ๋‹จ์–ด์˜ ํ™œ์šฉํ˜•์„ ์ฟผ๋ฆฌ ๋ฌธ์ž์—ด๊ณผ ๋ฌธ์„œ ๋ชจ๋‘์—์„œ ํ•˜๋‚˜์˜ ๊ณตํ†ต๋œ ์›ํ˜•์œผ๋กœ ์ค„์ด๋Š” ๊ฒƒ
    ex) {computers, computer, compute} -> compute๋ผ๋Š” ๋‹จ์–ด๋กœ ๋งคํ•‘ ๊ฐ€๋Šฅ

๐Ÿ”Ž ๋น ๋ฅด๊ณ  ํšจ๊ณผ์ ์ธ ์Šคํ…Œ๋ฐ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ค๊ณ„ํ•˜๋ผ.

Hint : ์ฃผ์–ด์ง„ ์˜ˆ์ œ ("computation, "computers", "compute", "computing")์„ ํ†ตํ•ด ์ผ๋ฐ˜์ ์ธ ๊ทœ์น™์„ ์ฐพ๊ธฐ

  • ๋Œ€๋ถ€๋ถ„์˜ ์Šคํ…Œ๋ฐ ์‹œ์Šคํ…œ์€ ๊ฐ„๋‹จํ•œ ์žฌ์ž‘์„ฑ ๊ทœ์น™์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•˜๊ณ  ์žˆ์Œ.
  • ex1) "es", "s", "ation"๊ณผ ๊ฐ™์€ ์ ‘๋ฏธ์‚ฌ๋Š” ์ œ๊ฑฐ
  • ex2) ๋ณต์ˆ˜ํ˜• ํ‘œํ˜„์ด ๋‹ค๋ฅธ ๊ฒฝ์šฐ -> wolves์˜ ์Šคํ…Œ๋ฐ ๊ฒฐ๊ณผ๋Š” wolf์ด๊ธฐ ๋•Œ๋ฌธ์— "ves"๋Š” "f"๋กœ ์น˜ํ™˜ํ•˜๋Š” ๊ทœ์น™๋„ ํ•„์š”ํ•จ.

  • ๋ฐฉ๋ฒ• 1: ๋ชจ๋“  ๊ทœ์น™์— ๋Œ€ํ•ด ์œ ํ•œ ์ƒํƒœ ๋จธ์‹ (finite state machine) ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ
    • ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ, ์–ด๋–ค ์ ‘๋ฏธ์‚ฌ ์ง‘ํ•ฉ์„ ๋งŒ๋‚˜๋ฉด ๊ทธ์— ์ƒ์‘ํ•˜๋Š” ๋‹ค๋ฅธ ๋ฌธ์ž์—ด๋กœ ๋Œ€์ฒดํ•˜๋Š” ๊ฒƒ

  • ๋ฐฉ๋ฒ• 2: Martin Porter๊ฐ€ ๊ฐœ๋ฐœํ•œ ํฌํ„ฐ ์Šคํ…Œ๋จธ(stemmer) ์‚ฌ์šฉ
    • ์ผ๋ฐ˜์ ์ธ ๊ทœ์น™์—์„œ ๋ฒ—์–ด๋‚˜๋Š” ๋‹จ์–ด๋ฅผ ์˜ˆ์™ธ์ƒํ™ฉ์œผ๋กœ ์ฒ˜๋ฆฌํ•จ.
    • ์˜์–ด๋ฅผ ์Šคํ…Œ๋ฐํ•˜๋Š” ๊ถŒ์œ„ ์žˆ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜์ž„.
    • ๋ชจ์Œ๊ณผ ์ž์Œ์˜ ํŒจํ„ด์„ ๋ฐ”ํƒ•์œผ๋กœ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๊ทœ์น™์„ ๋งŒ๋“ค์—ˆ์Œ.

  • ๋ฐฉ๋ฒ• 3: ์Šคํ† ์บ์Šคํ‹ฑ(stochastic) ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•ด ๊ทœ์น™์„ ์žฌ์ž‘์„ฑ
    • M-๊ทธ๋žจ ๋ฐฉ๋ฒ• -> ์ฃผ๋ณ€ ๋‹จ์–ด๋ฅผ ํ†ตํ•ด ํ˜„์žฌ ๋‹จ์–ด์˜ ์˜ฌ๋ฐ”๋ฅธ ์Šคํ…Œ๋ฐ ๊ฒฐ๊ณผ๋ฅผ ์ฐพ์Œ.


๐Ÿฆ• ๋˜ ๋‹ค๋ฅธ ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์•Œ์•„๋ณด๊ธฐ

1๏ธโƒฃ Symspell ์•Œ๊ณ ๋ฆฌ์ฆ˜

2๏ธโƒฃ Deep-speller

3๏ธโƒฃ ํ•œ๊ธ€ ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ ๊ด€๋ จ

4๏ธโƒฃ SymSpell ์•Œ๊ณ ๋ฆฌ์ฆ˜ + ํ•œ๊ธ€

5๏ธโƒฃ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์—์„œ์˜ ์˜คํƒ€ ๊ต์ • ๋ฐ ๋ณด์ • ๊ด€๋ จ

๋ฒˆ์™ธ) ๋งž์ถค๋ฒ• ๊ด€๋ จ ๋…ผ๋ฌธ ์ •๋ฆฌ



Day6


โ˜๏ธ Symspell ์•Œ๊ณ ๋ฆฌ์ฆ˜ ํ†บ์•„๋ณด๊ธฐ

๋‚ด์šฉ ์ถœ์ฒ˜

  • Symspell ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋ž€?

    • ํŽธ์ง‘ ๊ฑฐ๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์—์„œ ์‚ญ์ œ(delection)๋งŒ ์‚ฌ์šฉ
    • ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ, ์ตœ์†Œํ•œ์˜ ์—ฐ์‚ฐ๋งŒ ์‚ฌ์šฉํ•˜๋ ค๋Š” ๊ฒƒ
    • ํŽธ์ง‘ ๊ฑฐ๋ฆฌ๋ฅผ ํ†ตํ•ด ๊ตฌํ•œ ์–ด์ ˆ๊ณผ ์›๋ž˜์˜ ๋‹จ์–ด๋ฅผ ํ•ด์‹œ ํ…Œ์ด๋ธ”๊ณผ ๊ฐ™์€ ๊ฒ€์ƒ‰์ด ๋น ๋ฅธ ์ž๋ฃŒ๊ตฌ์กฐ ํ˜•์‹์œผ๋กœ ์ €์žฅํ•จ.
    • ์‚ฌ์ „์— ์žˆ๋Š” ๋‹จ์–ด์˜ delection์„ ๊ณ„์‚ฐํ•ด ์ž๋ฃŒ ๊ตฌ์กฐ๋กœ ์ €์žฅ
    • ex) "apple"๊ณผ "apply"
      • ํŽธ์ง‘ ๊ฑฐ๋ฆฌ๊ฐ€ 1์ธ delection์€ "appl", "pply" ๋“ฑ์ด ์žˆ์Œ.
      • ํŽธ์ง‘ ๊ฑฐ๋ฆฌ๊ฐ€ 2์ธ delection์€ "app", "ppl" ๋“ฑ์ด ์žˆ์Œ.

  • ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€? (Python ver.)

  1. ์˜คํ”ˆ์†Œ์Šค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ†ตํ•ด pip ์„ค์น˜
    pip install symspellpy

  1. ์•ž์„œ ๋งŒ๋“  ํ‚ค์›Œ๋“œ ์‚ฌ์ „์„ ๋ถˆ๋Ÿฌ์™€ ์‚ฌ์ „ ๊ตฌ์ถ• & lookup ๋ฉ”์„œ๋“œ๋ฅผ ํ†ตํ•œ ์œ ์‚ฌ ์–ด์ ˆ ์ฐพ๊ธฐ
    a. Verbosity.ALL์„ ์ด์šฉํ•ด ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ํŽธ์ง‘ ๊ฑฐ๋ฆฌ์™€ ๊ฐ€์žฅ ํฐ ๋นˆ๋„์ˆ˜๋ฅผ ๊ฐ–๋Š” ์–ด์ ˆ์„ ๋ชจ๋‘ ๋ฐ˜ํ™˜
    b. ๋” ์ž์„ธํ•œ ์‚ฌ์šฉ๋ฒ•์€ API ๋ฌธ์„œ ํ™•์ธ

  1. ํ•œ๊ธ€์˜ ํŠน์„ฑ์— ๋งž์ถฐ ์ž์†Œ ๋‹จ์œ„๋กœ ๋ถ„๋ฆฌํ•ด ์ •ํ™•ํ•œ ์˜คํƒ€ ๊ต์ •ํ•˜๊ธฐ
    a. ํ•œ๊ธ€ ์ž์†Œ ๋ถ„ํ•ด : ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ hangul-utils ์‚ฌ์šฉ
word = split_syllables(word)
corrections = sym_spell.lookup(word, Verbosity.ALL, max_edit_distance=1)
for corr in corrections:
    print(corr.term, join_jamos(corr.term), corr.distance, corr.count)
if corrections:
    corrected_word = join_jamos(corrections[0].term)
    word = corrected_word
    print(f"์˜คํƒ€ ๊ต์ • ๊ฒฐ๊ณผ: {word}")
else:
    print("์˜คํƒ€ ๊ต์ • ๊ฒฐ๊ณผ๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.")


Day7


โ˜๏ธ ์ตœ์ข… ์ •๋ฆฌ

Q. ๋งž์ถค๋ฒ• ๊ฒ€์‚ฌ ์‹œ์Šคํ…œ์˜ ์„ค๊ณ„๋Š” ์–ด๋–ป๊ฒŒ ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์„๊นŒ?


A. ๋ ˆ๋ฒค์Šˆํƒ€์ธ ๊ฑฐ๋ฆฌ๋ฅผ ํ†ตํ•ด ํŽธ์ง‘ ๊ฑฐ๋ฆฌ๋ฅผ ๊ตฌํ•˜๊ณ , ๊ฐ ํ›„๋ณด ๋ฌธ์ž์—ด์— ๋žญํ‚น์„ ๋งค๊ฒจ, ๊ฐ€์žฅ ๊ทธ๋Ÿด ๋“ฏํ•œ ๋ฌธ์ž์—ด๋ถ€ํ„ฐ ์ฐจ๋ก€๋Œ€๋กœ ๋ณด์—ฌ์ฃผ๋Š” ๋ฐฉ์‹์ด ๊ฐ€์žฅ ์ข‹๋‹ค.

  • Solution 01 : ๋ฌธ์ž์—ด์„ ๋‚˜์—ดํ•ด ๋‘ ๋ฒˆ์งธ ๋ฌธ์ž์—ด๊ณผ ๊ฐ™์€ ๋ฌธ์ž์—ด์„ ์ฐพ์„ ๋•Œ๊นŒ์ง€ ์ฒซ ๋ฒˆ์งธ ๋ฌธ์ž์—ด๊ณผ ๊ฑฐ๋ฆฌ๊ฐ€ 1, 2, 3, ...์ธ ๋ชจ๋“  ๋ฌธ์ž์—ด์„ ๋‚˜์—ดํ•˜๋Š” ๋ฐฉ๋ฒ•
    • 2n2^n๊ฐœ์˜ ๋ฌธ์ž์—ด์„ ์‚ดํŽด๋ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ์ ์ด ์žˆ์Œ.

  • Solution 02 : ํƒ์ƒ‰ ๋„์ค‘ '๊ฐ€์ง€ ์น˜๋Š”' ๋ฐฉ๋ฒ•
    • ์ฒซ ๋ฒˆ์งธ ๋ฌธ์ž์—ด์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž๊ฐ€ ๋‘ ๋ฒˆ์งธ ๋ฌธ์ž์—ด์˜ ๋งˆ์ง€๋ง‰ ๋ฌธ์ž์™€ ๊ฐ™๋‹ค๋ฉด "๋ฌธ์ž ๋ฌด์‹œ", ๋‹ค๋ฅด๋‹ค๋ฉด ์ตœ์ข… ํŽธ์ง‘ ์ˆ˜ํ–‰
    • ์ตœ์ข… ํŽธ์ง‘์€ ์‚ฝ์ž…, ์‚ญ์ œ, ์น˜ํ™˜ ์ค‘์˜ ํ•˜๋‚˜


profile
์–ธ์  ๊ฐ€ ๋‚ด ์ฝ”๋“œ๋กœ ์„ธ์ƒ์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ๋„๋ก, BE&Data Science ๊ฐœ๋ฐœ ๊ธฐ๋ก ๋…ธํŠธโ˜˜๏ธ

0๊ฐœ์˜ ๋Œ“๊ธ€