(다시 풀어야 할) HackerRank/LeetCode SQL 문제 풀이

hyeh·2022년 8월 31일

알고리즘 문제풀이

목록 보기

15/15

SQL Project Planning
Write a query to output the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.

스타트 데이트와 엔드 데이트, 그리고 얼마나 걸리고 하나 이상 같은 소요된 날짜가 있으면 시작 날짜로 정렬하는 문제다.

DATEDIFF 함수 다시 보기
샘플을 보며 특징을 잡는 게 중요하다

-- 1) end_date로 row number 매기기 : 프로젝트 하나가 끝나려면 엔드 데이터가 스타트 데이트에 없으면 된다
SELECT end_date,
    ROW_NUMBER() OVER(ORDER BY end_date) rnk
FROM projects
WHERE end_date NOT IN (SELECT DISTINCT start_date FROM projects) p_end_date
-- 2) start_date로 row number 매기기 : 스타트 데이트가 엔드 데이트에 없으면 프로젝트가 싲가하는 날짜
SELECT start_date,
    ROW_NUMBER() OVER(ORDER BY sart_date) rnk
FROM projects
WHERE start_date NOT IN (SELECT DISTINCT end_date FROM projects) p_start_date
-- 3) 위의 두 특징을 가지고(프로젝트 시작날짜와 끝나는 날짜를 알게 됨) 테이블 2개를 만들어서 조인을 시킨다 
SELECT start_date, end_date
FROM (
    SELECT start_date,
        ROW_NUMBER() OVER(ORDER BY start_date) rnk
    FROM projects
    WHERE start_date NOT IN (SELECT DISTINCT end_date FROM projects)) p_start_date
    INNER JOIN (
        SELECT end_date,
            ROW_NUMBER() OVER(ORDER BY end_date) rnk
        FROM projects
        WHERE end_date NOT IN (SELECT DISTINCT start_date FROM projects)) p_end_date
    ON p_start_date.rnk = p_end_date.rnk
ORDER BY DATEDIFF(end_date, start_date), start_date

Weather Observation Station 11
Query the list of CITY names from STATION that either do not start with vowels or do not end with vowels. Your result cannot contain duplicates.

정규표현식을 다시 짚기 위한 문제

-- 1) LEFT(), RIGHT() 사용
SELECT DISTINCT city
FROM station
WHERE LEFT(city, 1) NOT IN ('A', 'E', 'I', 'O', 'U')
    OR RIGHT(city, 1) NOT IN ('A', 'E', 'I', 'O', 'U')
-- 2) SUBSTR() 사용
SELECT DISTINCT city
FROM station
WHERE SUBSTR(city, 1, 1) NOT IN ('A', 'E', 'I', 'O', 'U')
    OR SUBSTR(city, -1, 1) NOT IN ('A', 'E', 'I', 'O', 'U')
-- 3) 정규표현식 사용
SELECT DISTINCT city
FROM station
WHERE city REGEXP '^[^aeiou]'
    OR city REGEXP '[^aeiou]$';

Weather Observation Station 20

A median is defined as a number separating the higher half of a data set from the lower half. Query the median of the Northern Latitudes (LAT_N) from STATION and round your answer to 4 decimal places.

중앙값을 찾는 문제. MySQL은 메디안 함수가 없기 때문에 복잡한 것을 해줘야 할 때가 있다.
중앙값 찾는 것 : 홀수 일 때(cnt + 1) / 2

-- 1) ROW_NUMBER()로 전체 갯수를 구하고
SELECT ROW_NUMBER() OVER(ORDER BY LAT_N) rnk,
     COUNT(*) OVER() cnt,
     LAT_N
FROM station
-- 2) WHERE 절에서 CASE 문으로 짝수일 때와 홀수 일때의 작업을 해준다
SELECT ROUND(AVG(LAT_N), 4) -- 짝수일 땐 평균을 구해줘야 하기 때문에! (홀수일 땐 하나니까 상관 없음)
FROM (
    SELECT ROW_NUMBER() OVER(ORDER BY LAT_N) rnk,
        COUNT(*) OVER() cnt, -- 각 행에 집계 COUNT 하기 위해 윈도우 함수로 사용
        LAT_N
    FROM station) A
WHERE CASE WHEN MOD(cnt, 2) = 1 THEN rnk = (cnt + 1) / 2 -- 홀수 일때 중앙값
            ELSE rnk IN (cnt / 2, cnt / 2 + 1) END -- 짝수일 때 중앙값

윈도우 함수를 사용하지 않으면 각행에 COUNT()를 할 수 없다
COUNT() 윈도우 함수는 ()안에 값이 지정 되든 안되든, 정렬을 하든 안 하든 집계를 하는데는 영향을 미치지 않기 때문에 그냥 COUNT(*) OVER() cnt 해줘도 된다.

178. Rank Scores

Write an SQL query to rank the scores. The ranking should be calculated according to the following rules:
The scores should be ranked from the highest to the lowest.
If there is a tie between two scores, both should have the same ranking.
After a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no holes between ranks.
Return the result table ordered by score in descending order.

랭킹 메기는 방법 3가지 중 하나 사용

RANK() : 1,1,3
DENSERNAK() : 1,1,2,3
ROW_RANK() : 1,2,3,4

SELECT score,
	DENSE_RANK() OVER(ORDER BY score DESC) `rnak` -- ``:는 함수명과 같은 것으로 별칭을 주려고 사용
FROM scores 
리트코드는 칼럼명을 맞춰야지 풀이가 통과된다

``:는 함수명과 같은 것으로 별칭을 주고자 할 때 조금 더 명확하게 표현하기 위해 칼럼명을 맞춰줄 수 있다.

262. Trips and Users
The cancellation rate is computed by dividing the number of canceled (by client or driver) requests with unbanned users by the total number of requests with unbanned users on that day.

Write a SQL query to find the cancellation rate of requests with unbanned users (both client and driver must not be banned) each day between "2013-10-01" and "2013-10-03". Round Cancellation Rate to two decimal points.

Return the result table in any order.

1) 캔슬 뒤에 뭐가 오던지 가져오겠다
2) 컴플리티드를 제외 시킨다 => 모든 캔슬을 가져올 수 있음
3) 밴니드를 찾으려면 따로따로 찾아와서 조인을

SELECT request_at,
    SUM(CASE WHEN status != 'completed' THEN 1 ELSE 0 END) cancel_cnt, -- 캔슬된 거 세기
    COUNT(*) total_cnt -- %를 만들기 위해 전체 갯수 가져오기
FROM trips T
INNER JOIN users U1
    ON T.client_id = U1.users_id
INNER JOIN users U2
    ON T.drivers_id = U2.users_id
WHERE U1.banned = 'No'
    AND U2.banned = 'No'
    AND T.requst_at BETWEEN '2013-10-01' AND '2013-10-03'
ORDER BY request_at

-- 그룹바이의 함정을 조심! 직접적으로는 셀렉트에 그룹바이 만들어갈 수 있고, 다른 것은 집계함수만 가능!
-- 나누기를 하기 위해 위에 값을 서브쿼리로 넣어줬다.
SELECT request_at Day,
    ROUND(cancel_cnt / total_cnt, 2) 'Cancellation Rate'
FROM (
    SELECT request_at,
        SUM(CASE WHEN status != 'completed' THEN 1 ELSE 0 END) cancel_cnt, 
        COUNT(*) total_cnt 
    FROM trips T
    INNER JOIN users U1
        ON T.client_id = U1.users_id
    INNER JOIN users U2
        ON T.driver_id = U2.users_id
    WHERE U1.banned = 'No'
        AND U2.banned = 'No'
        AND T.request_at BETWEEN '2013-10-01' AND '2013-10-03'
    ORDER BY request_at) A

626. Exchange Seats
Write an SQL query to swap the seat id of every two consecutive students. If the number of students is odd, the id of the last student is not swapped.

Return the result table ordered by id in ascending order.

전체 갯수를 정확하게 알 수 없을 때 홀수, 짝수 두 경우 모두 작동되도록 CASE문을 짜주는 게 좋다.

SELECT id,
    student,
    COUNT(*) OVER() total_cnt
FROM seat -- 마지막 학생은 id와 total_cnt가 같은 학생으로 찾을 수 있음
-- 이걸 서브쿼리로 사용
SELECT CASE WHEN MOD(id, 2) = 1 AND id != total_cnt THEN id + 1 -- 마지막 학생이 아닌 홀수인 학생
    	WEHN MOD(id, 2) = 0 THEN id - 1 -- 마지막 학생이 아닌 짝수 학생
    	ELSE id END id, -- 마지막 학생
    student
FROM(
    SELECT id,
        student,
        COUNT(*) OVER() total_cnt
    FROM seat) A
ORDER BY id;

윈도우 함수는 정말 하나의 창을 새로 만들어준다고 생각한다.
창을 하나로 만들어져서 따로 구동된다고
뭘로 집계가 될 건지 생각 / 윈도우 함수로 집계할 것인지, 그룹바이로 집계할 것인지 구분해서 사용해야 한다.

hyeh

좌충우돌 천방지축 룰루랄라 데이터 공부

이전 포스트

(다시 풀어야 할) HackerRank/LeetCode SQL 문제 풀이

알고리즘 문제풀이

(다시 풀어야 할) HackerRank SQL 문제 풀이

0개의 댓글