Retention Chart를 SQL으로 그려보자 !(mysql)

sir.YOO_HWAN·2022년 7월 5일
1

GrowthHacking

목록 보기
9/9
post-thumbnail

월간 리텐션

WITH montly_visit
     AS (SELECT r.customer_id,
                Date_format(c.first_visit_date, '%Y-%m-01') first_visit_month,  -- 데이트 포맷함수는일자를 원하는 포맷으로 변경해주는데, 이때날짜형식은파괴된다. 
                Date_format(r.reg_dttm, '%Y-%m-01')       visit_month -- 날짜 형식을 쓰고 싶으면 date함수를 사용한다. 
         FROM   records as r
                inner join (SELECT customer_id,  -- 고객 별 최초 방문일 
                                   Min(reg_dttm)          AS first_visit_date -- 첫 방문일을 구하기 위해 min을 해준다. 이때 min을 쓰기 때문에 distinct 는 필요없다. 
                            FROM   records -- 로그가 담겨져 있는 테이블 
                            GROUP  BY customer_id
                            ORDER  BY first_visit_date) AS as c
                        ON r.customer_id = c.customer_id)
-- CTE
SELECT first_visit_month --  첫 방문달을 기준으로 캐이스 구문으로 카운트 해준다. 이때 distinct 는 1,0을 계산하기 위함이다. 
     , COUNT(DISTINCT customer_id) 																				as 	month0
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 1 MONTH) = visit_month THEN customer_id END) AS month1
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 2 MONTH) = visit_month THEN customer_id END) AS month2
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 3 MONTH) = visit_month THEN customer_id END) AS month3
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 4 MONTH) = visit_month THEN customer_id END) AS month4
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 5 MONTH) = visit_month THEN customer_id END) AS month5
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 6 MONTH) = visit_month THEN customer_id END) AS month6
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 7 MONTH) = visit_month THEN customer_id END) AS month7
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 8 MONTH) = visit_month THEN customer_id END) AS month8
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 9 MONTH) = visit_month THEN customer_id END) AS month9
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 10 MONTH) = visit_month THEN customer_id END) AS month10
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_visit_month, INTERVAL 11 MONTH) = visit_month THEN customer_id END) AS month11
from montly_visit
GROUP BY first_visit_month
ORDER BY first_visit_month
  1. -- 고객 별 최초 방문일 구하기
  2. Date_format(c.first_visit_date, '%Y-%m-01') first_visit_month, -- 데이트 포맷함수는일자를 원하는 포맷으로 변경해주는데, 이때날짜형식은파괴된다.
  3. -- 첫 방문달을 기준으로 캐이스 구문으로 카운트 해준다. 이때 distinct 는 1,0을 계산하기 위함이다.
  4. INTERVAL 10 MONTH
    • first_visit_month 기준으로 10개월 동안 측정

주간 리텐션


WITH records_preprocessed
     AS (SELECT r.customer_id,
       Date_format(Date_sub(`first_order_date`, INTERVAL (Dayofweek(`first_order_date`) -1) day), '%Y-%m-%d') AS first_order_week,
       Date_format(Date_sub(`order_date`, INTERVAL (Dayofweek(`order_date`)-1) day), '%Y-%m-%d')              AS order_week
FROM   records r
       INNER JOIN (SELECT customer_id,
                          Min(order_date)          AS first_order_date,
                          Max(order_date)          AS last_order_date,
                          Count(DISTINCT order_id) AS cnt_orders,
                          Sum(sales)               AS sum_sales
                   FROM   records
                   GROUP  BY customer_id
                   ORDER  BY first_order_date) AS c
               ON r.customer_id = c.customer_id 
)
-- CTE
SELECT first_order_week
     , COUNT(DISTINCT customer_id) week0
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 1 WEEK) = order_week THEN customer_id END) AS week
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 2 WEEK) = order_week THEN customer_id END) AS week
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 3 WEEK) = order_week THEN customer_id END) AS week
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 4 WEEK) = order_week THEN customer_id END) AS week
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 5 WEEK) = order_week THEN customer_id END) AS week
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 6 WEEK) = order_week THEN customer_id END) AS week
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 7 WEEK) = order_week THEN customer_id END) AS week
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 8 WEEK) = order_week THEN customer_id END) AS week
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 9 WEEK) = order_week THEN customer_id END) AS week
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 10 WEEK) = order_week THEN customer_id END) AS week
     , COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 11 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 12 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 13 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 14 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 15 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 16 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 17 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 18 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 19 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 20 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 21 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 22 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 23 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 24 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 25 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 26 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 27 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 28 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 29 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 30 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 31 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 32 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 33 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 34 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 35 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 36 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 37 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 38 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 39 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 40 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 41 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 42 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 43 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 44 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 45 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 46 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 47 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 48 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 49 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 50 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 51 WEEK) = order_week THEN customer_id END) AS week
		, COUNT(DISTINCT CASE WHEN DATE_ADD(first_order_week, INTERVAL 52 WEEK) = order_week THEN customer_id END) AS week
from records_preprocessed
GROUP BY first_order_week
ORDER BY first_order_week
  • 주별로 리텐션을 측정하기 위해 CTE를 만들어준다. first_order_week, order_week 두가지를 만들어준다.

  • Dayofweek :  date를 넘겨주면, dayofweek는 수를, dayname은 요일 이름을 돌려줍니다.

    • 예를 들어서, date가 '2020-03-20'이라고 한다면, 후자의 리턴 값은 'Friday'가 됩니다. 문제는 dayofweek가 리턴하는 정수 값입니다.
    • 1 = Sunday, 2 = Monday,..... 7 = Saturday* 일요일이 0이 아닌 1 입니다.
  • DAYOFYEAR(date) : 해당 날짜의 1월 1일부터의 날수를 반환한다. 결과값은 1에서 366 까지이다.
    - 예 : select DAYOFYEAR('1998-02-03');

  • DATE_SUB(date,INTERVAL expr type) : 날짜를 뺀 날짜를 반환한다.

  • 추가

    	- [예제]

mysql> select date_add("1997-12-31 23:59:59",interval 1 second);
-> 1998-01-01 00:00:00
mysql> select date_add("1997-12-31 23:59:59",interval 1 day);
-> 1998-01-01 23:59:59
mysql> select date_add("1997-12-31 23:59:59",interval "1:1" minute_second);
-> 1998-01-01 00:01:00
mysql> select date_sub("1998-01-01 00:00:00",interval "1 1:1:1" day_second);
-> 1997-12-30 22:58:59
mysql> select date_add("1998-01-01 00:00:00",interval "-1 10" day_hour);
-> 1997-12-30 14:00:00
mysql> select date_sub("1998-01-02", interval 31 day);
-> 1997-12-02

출처: https://jang8584.tistory.com/7 [개발자의 길:티스토리]
profile
data analyst

0개의 댓글