- 이 부분 사실일까 확인
- user_id: 유저 고유 번호
- created_at: 유저가 가입한 시간
- 유저의 상태(활성화, 보류)
- 유저가 활성화 한 시간
- 유저의 회사 아이디
- 유저가 고른 언어
- user_id: 이벤트를 기록하는 사용자의 ID. 다른 테이블의 사용자_id에 조인될 수 있다.
- occurred_at: 이벤트가 일어난 시간
- event_type: 일반 이벤트 유형.
- signup_flow : 사용자 인증 프로세스 중에 발생하는 모든 작업을 나타냄
- engagement: 사용자가 처음 가입한 후 일반적인 제품 사용량을 나타냄- event_name: 사용자가 수행한 특정 작업
- create_user: 가입 프로세스 중에 사용자가 Yammer의 DB에 추가됨
- enter_email: 사용자가 전자 메일 주소를 입력하여 등록 프로세스를 시작
- enter_info: 사용자가 등록 과정에서 자신의 이름과 개인 정보를 입력
- complete_signup: 사용자가 전체 등록/인증 프로세스를 완료
- home_page: 사용자가 홈 페이지를 로드합니다.
- like_message: 사용자가 다른 유저의 메시지를 like
- login: 사용자가 Yammer에 로그인
- search_incomplete: 사용자가 자동 완료 목록에서 검색 결과를 선택
- search_run: 사용자가 검색 쿼리를 실행하고 검색 결과 페이지로 이동
- search_click_result_X: 사용자가 결과 페이지에서 검색 결과 X를 클릭, X는 1에서 10 사이의 숫자입니다.
- send_message: 사용자가 메시지를 게시
- view_inbox: 사용자가 받은 편지함의 메시지- location: 이벤트가 기록된 국가(IP 주소를 통해 수집됨)
- device: 이벤트가 기록되는데 사용된 기기의 종류
Search use: 검색을 사용하는 사람이 있는지 여부 확인
Search frequency(검색 빈도): 사용자가 검색을 많이 하는 경우 주요 예외를 제외하고 with 기능에서 값을 얻고 있을 가능성이 높다. -> 사용자가 짧은 시간 내에 반복적으로 검색하면 처음에 원하는 항목을 찾지 못했기 때문에 용어를 수정하는 것일 수 있다.
Repeated terms(반복되는 용어): 검색 용어의 유사성을 비교하기.-> 사용자가 짧은 시간 내에 수행하는 검색 수를 계산하는 것(2. 검색빈도)보다 훨씬 느리고 실제로 수행하기가 더 어렵다.
Clickthroughs: 검색 순위 양호 여부 확인. 사용자가 검색 결과에서 많은 링크를 클릭하면 좋은 경험이 없을 가능성이 높다. -> 사용자가 낮은 결과를 자주 클릭하거나 추가 페이지로 스크롤하는 경우 순위 알고리즘을 조정해야 한다.
Autocomplete Clickthroughs(자동완성 클릭스루): 이거는 따로 성공여부를 별도로 측정해야 한다.
- 동료의 솔루션에서 session의 정의: 두 이벤트 사이에 10분 동안 쉬지 않고 사용자가 기록한 이벤트 문자열로 정의됨. 따라서 사용자가 이벤트를 기록하지 않고 10분 동안 진행하면 세션이 종료되고 다음 작업이 새 세션으로 간주됩니다.
- 사람들이 얼마나 자주 검색하는지 + 검색 빈도가 시간에 따라 변하는지 살펴보기
- 사용자는 검색 결과 페이지로 이동하는 검색을 실제로 실행하는 것보다 더 자주 자동 완성 기능을 활용
SELECT DATE_TRUNC('week',z.session_start) AS week,
COUNT(*) AS sessions,
COUNT(CASE WHEN z.autocompletes > 0 THEN z.session ELSE NULL END) AS with_autocompletes,
COUNT(CASE WHEN z.runs > 0 THEN z.session ELSE NULL END) AS with_runs
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event IS NULL
OR next_event IS NULL
) final
GROUP BY 1,2
) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
AND e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement'
) x
GROUP BY 1,2,3
) z
GROUP BY 1
ORDER BY 1
- 자동 완성 기능은 세션의 약 25%에서 사용되며 검색 기능은 8% 정도에서만 사용됩니다.
- 자동완성기능의 25% 사용 -> 사용자가 Yammer 네트워크에서 정보를 찾아야 한다는 것을 나타난다. 다시 말해 사람들이 사용하는 기능이고 주목할 만한 가치가 있다.
SELECT DATE_TRUNC('week',z.session_start) AS week,
COUNT(CASE WHEN z.autocompletes > 0 THEN z.session ELSE NULL END)/COUNT(*)::FLOAT AS with_autocompletes,
COUNT(CASE WHEN z.runs > 0 THEN z.session ELSE NULL END)/COUNT(*)::FLOAT AS with_runs
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event IS NULL
OR next_event IS NULL
) final
GROUP BY 1,2
) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
AND e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement'
) x
GROUP BY 1,2,3
) z
GROUP BY 1
ORDER BY 1
- 자동 완성은 일반적으로 세션당 한 번 또는 두 번 사용된다.
SELECT autocompletes,
COUNT(*) AS sessions
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event IS NULL
OR next_event IS NULL
) final
GROUP BY 1,2
) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
AND e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement'
) x
GROUP BY 1,2,3
) z
WHERE autocompletes > 0
GROUP BY 1
ORDER BY 1
- 전체 검색을 실행하는 경우 일반적으로 단일 세션에서 여러 검색을 실행한다.
- 전체 검색이 거의 사용되지 않는 기능이라는 점을 고려할 때, 이는 검색 결과가 그다지 좋지 않거나 항상 검색 및 사용을 좋아하는 사용자 그룹이 매우 적다는 것을 나타낸다.
SELECT runs,
COUNT(*) AS sessions
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event IS NULL
OR next_event IS NULL
) final
GROUP BY 1,2
) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
AND e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement'
) x
GROUP BY 1,2,3
) z
WHERE runs > 0
GROUP BY 1
ORDER BY 1
- 더 자세히 봤을 때, 검색이 잘 안 되는 게 분명하다. 왜냐하면 사용자가 검색하는 세션에서는 다음 결과를 거의 클릭하지 않는다.
SELECT clicks,
COUNT(*) AS sessions
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event IS NULL
OR next_event IS NULL
) final
GROUP BY 1,2
) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
AND e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement'
) x
GROUP BY 1,2,3
) z
WHERE runs > 0
GROUP BY 1
ORDER BY 1
- 또한 지정된 세션에서 더 많은 검색을 수행해도 평균적으로 클릭 수가 더 많아지지 않는다.
SELECT runs,
AVG(clicks)::FLOAT AS average_clicks
FROM (
SELECT x.session_start,
x.session,
x.user_id,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs,
COUNT(CASE WHEN x.event_name LIKE 'search_click_%' THEN x.user_id ELSE NULL END) AS clicks
FROM (
SELECT e.*,
session.session,
session.session_start
FROM tutorial.yammer_events e
LEFT JOIN (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event IS NULL
OR next_event IS NULL
) final
GROUP BY 1,2
) session
ON e.user_id = session.user_id
AND e.occurred_at >= session.session_start
AND e.occurred_at <= session.session_end
WHERE e.event_type = 'engagement'
) x
GROUP BY 1,2,3
) z
WHERE runs > 0
GROUP BY 1
ORDER BY 1
- 사용자가 검색 결과를 클릭할 때 클릭 수가 결과 순서 전체에 고르게 분포된다. -> 순서가 좋지 않음을 나타냄. 검색이 잘 수행되고 있다면 상위 2개 또는 3개 결과에 가중치가 많이 부여될 것이다.
SELECT TRIM('search_click_result_' FROM event_name)::INT AS search_result,
COUNT(*) AS clicks
FROM tutorial.yammer_events
WHERE event_name LIKE 'search_click_%'
GROUP BY 1
ORDER BY 1
마지막으로 전체 검색을 실행하는 사용자는 다음 달 내에 다시 전체 검색을 실행하는 경우가 거의 없다.
SELECT searches,
COUNT(*) AS users
FROM (
SELECT user_id,
COUNT(*) AS searches
FROM (
SELECT x.session_start,
x.session,
x.user_id,
x.first_search,
COUNT(CASE WHEN x.event_name = 'search_run' THEN x.user_id ELSE NULL END) AS runs
FROM (
SELECT e.*,
first.first_search,
session.session,
session.session_start
FROM tutorial.yammer_events e
JOIN (
SELECT user_id,
MIN(occurred_at) AS first_search
FROM tutorial.yammer_events
WHERE event_name = 'search_run'
GROUP BY 1
) first
ON first.user_id = e.user_id
AND first.first_search <= '2014-08-01'
LEFT JOIN (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event IS NULL
OR next_event IS NULL
) final
GROUP BY 1,2
) session
ON session.user_id = e.user_id
AND session.session_start <= e.occurred_at
AND session.session_end >= e.occurred_at
AND session.session_start <= first.first_search + INTERVAL '30 DAY'
WHERE e.event_type = 'engagement'
) x
GROUP BY 1,2,3,4
) z
WHERE z.runs > 0
GROUP BY 1
) z
GROUP BY 1
ORDER BY 1
LIMIT 100
- 이에 비해 자동 완성 기능을 사용하는 사용자는 계속해서 더 빠른 속도로 자동 완성 기능을 사용한다.
SELECT searches,
COUNT(*) AS users
FROM (
SELECT user_id,
COUNT(*) AS searches
FROM (
SELECT x.session_start,
x.session,
x.user_id,
x.first_search,
COUNT(CASE WHEN x.event_name = 'search_autocomplete' THEN x.user_id ELSE NULL END) AS autocompletes
FROM (
SELECT e.*,
first.first_search,
session.session,
session.session_start
FROM tutorial.yammer_events e
JOIN (
SELECT user_id,
MIN(occurred_at) AS first_search
FROM tutorial.yammer_events
WHERE event_name = 'search_autocomplete'
GROUP BY 1
) first
ON first.user_id = e.user_id
AND first.first_search <= '2014-08-01'
LEFT JOIN (
SELECT user_id,
session,
MIN(occurred_at) AS session_start,
MAX(occurred_at) AS session_end
FROM (
SELECT bounds.*,
CASE WHEN last_event >= INTERVAL '10 MINUTE' THEN id
WHEN last_event IS NULL THEN id
ELSE LAG(id,1) OVER (PARTITION BY user_id ORDER BY occurred_at) END AS session
FROM (
SELECT user_id,
event_type,
event_name,
occurred_at,
occurred_at - LAG(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) AS last_event,
LEAD(occurred_at,1) OVER (PARTITION BY user_id ORDER BY occurred_at) - occurred_at AS next_event,
ROW_NUMBER() OVER () AS id
FROM tutorial.yammer_events e
WHERE e.event_type = 'engagement'
ORDER BY user_id,occurred_at
) bounds
WHERE last_event >= INTERVAL '10 MINUTE'
OR next_event >= INTERVAL '10 MINUTE'
OR last_event IS NULL
OR next_event IS NULL
) final
GROUP BY 1,2
) session
ON session.user_id = e.user_id
AND session.session_start <= e.occurred_at
AND session.session_end >= e.occurred_at
AND session.session_start <= first.first_search + INTERVAL '30 DAY'
WHERE e.event_type = 'engagement'
) x
GROUP BY 1,2,3,4
) z
WHERE z.autocompletes > 0
GROUP BY 1
) z
GROUP BY 1
ORDER BY 1
LIMIT 100
- 자동 완성이 상당히 잘 수행되고 있는 반면 검색 실행은 그렇지 못하다.
- 가장 주목해야 할 부분은 검색 결과의 순서이다.
- 사용자는 자동 완성에서 원하는 항목을 제공하지 않을 때 전체 검색을 실행할 가능성이 있다. -> 검색 순위 알고리즘을 변경하여 자동 완성 결과와 약간 다른 결과를 제공하는 것이 도움이 될 수 있다.
전체 검색 결과를 개선하는 데 초점을 맞춰야 한다.