[SQL] 문제풀기 0522

임재규·2023년 5월 22일
0

Data_Engineering_Track_22

목록 보기
12/24

DataBase : instacart 설명

  • orders (3.4m rows, 206k users):
    .
    order_id: order identifier
    user_id: customer identifier
    eval_set: which evaluation set this order belongs in (see SET described below)
    order_number: the order sequence number for this user (1 = first, n = nth)
    order_dow: the day of the week the order was placed on
    order_hour_of_day: the hour of the day the order was placed on
    days_since_prior: days since the last order, capped at 30 (with NAs for order_number = 1)
    .
  • products (50k rows):
    .
    product_id: product identifier
    product_name: name of the product
    aisle_id: foreign key
    department_id: foreign key

  • aisles (134 rows):
    .
    aisle_id: aisle identifier
    aisle: the name of the aisle

*deptartments (21 rows):
.
department_id: department identifier

department: the name of the department

  • order_products__SET (30m+ rows):
    .
    order_id: foreign key
    product_id: foreign key
    add_to_cart_order: order in which each product was added to cart
    reordered: 1 if this product has been ordered by this user in the past, 0 otherwise

where SET is one of the four following evaluation sets (eval_set in orders):
"prior": orders prior to that users most recent order (~3.2m orders)

문제

  1. 전체 주문 건수
  2. 구매자 수
  3. 상품별 주문 건수
  4. 카트에 가장 먼저 넣는 상품 10개
    .- order_products_prior의 add_to_cart_order라는 컬럼은 상품이 몇 번째로 장바구니에 담겼는지 의미
  5. 시간별 주문 건수

답안

1. 전체 주문 건수 
SELECT count(order_id) FROM orders;


2. 구매자 수 
- SELECT count(DISTINCT(user_id)) FROM orders ;


3. 상품별 주문 건수 
SELECT B.product_name, COUNT(DISTINCT A.order_id) F
FROM order_products__prior A 
LEFT JOIN 
products B 
ON A.product_id = B.product_id
GROUP BY 1;

#위 코드와 동일( having)
SELECT B.product_name, COUNT(DISTINCT A.order_id) F
FROM order_products__prior A 
LEFT JOIN 
products B 
USING (product_id)      # 키로 잡는 컬럼이 같다면 having을 써도 됨
GROUP BY 1;


4. 카트에 가장 먼저 넣는 상품 10- order_products_prior의 add_to_cart_order라는 컬럼은 상품이 몇 번째로 장바구니에 담겼는지 의미 

# add_to_cart_order가 1인 경우를 F_1st 칼럼에 1을 할당하고 아닌 경우는 0을 할당함
SELECT * FROM 
(SELECT product_id, 
case when add_to_cart_order = 1 then 1 ELSE 0 END F_1st
FROM order_products__prior) A 
WHERE F_1st = 1;


SELECT product_id, SUM(case when add_to_cart_order = 1 then 1 ELSE 0 END) F_1st
FROM order_products__prior 
GROUP BY product_id;


SELECT * FROM 
(SELECT *, ROW_NUMBER() OVER(ORDER BY F_1st DESC) rnk
FROM 
(SELECT product_id, SUM(case when add_to_cart_order = 1 then 1 ELSE 0 END) F_1st
FROM order_products__prior 
GROUP BY product_id) A LIMIT 10) B
LEFT JOIN products C
ON B.product_id = C.product_id
ORDER BY rnk ;


5. 시간별 주문 건수
SELECT order_hour_of_day, COUNT(DISTINCT order_id) F
FROM orders 
GROUP BY order_hour_of_day
ORDER BY F DESC;
profile
공부 기록

0개의 댓글