๐Ÿ“„ Neural Collaborative Filtering ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

์„œ์€์„œยท2023๋…„ 8์›” 31์ผ
0

Paper Review

๋ชฉ๋ก ๋ณด๊ธฐ
3/6

0. Abstract

Collaborative Filltering(์‚ฌ์šฉ์ž์™€ ์ƒํ’ˆ feature ์‚ฌ์ด์˜ ์ƒํ˜ธ์ž‘์šฉ)์˜ key factor๋ฅผ ๋ชจ๋ธ๋ง ํ•  ๋•Œ, Matrix Factorization์„ ์‚ฌ์šฉํ•˜๋ฉฐ ์‚ฌ์šฉ์ž์™€ ์ƒํ’ˆ์˜ ์ž ์žฌ feature์˜ ๋‚ด์ ์„ ์ ์šฉํ•œ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Neural Net ๊ธฐ๋ฐ˜์˜ architecture์ธ NCF(Neural Collaborative Filtering)์„ ์ œ์‹œํ•œ๋‹ค.

๐Ÿ“Œ NCF๋Š”

  • ์ผ๋ฐ˜์ ์ด๊ณ  framework์•ˆ์—์„œ MF๋ฅผ ์ผ๋ฐ˜ํ™” ํ•  ์ˆ˜ ์žˆ๋‹ค.
  • user-item interaction function์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด multi-layer perceptron์„ ์ œ์•ˆํ•˜๋ฉด์„œ ๋น„์„ ํ˜•์ ์ธ ๋ชจ๋ธ๋ง์„ ๊ฐ•ํ™”ํ–ˆ๋‹ค.

1. Introduction

๊ฐœ์ธ ์ถ”์ฒœ์‹œ์Šคํ…œ์˜ ํ•ต์‹ฌ์€ ๊ทธ๋“ค์˜ ์ด์ „ interactions(ratings and clicks)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์‚ฌ์šฉ์ž์˜ ์„ ํ˜ธ ์ƒํ’ˆ์„ ๋ชจ๋ธ๋ง ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

MF๋Š” CF๋ฐฉ์‹ ์ค‘ ๊ฐ€์žฅ ์œ ๋ช…ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ latent features vector์„ ์ด์šฉํ•ด์„œ ์‚ฌ์šฉ์ž์™€ ์ƒํ’ˆ์„ latent space๋กœ ํˆฌ์˜์‹œํ‚จ๋‹ค. ์ด ๋•Œ ์‚ฌ์šฉ์ž์™€ ์ƒํ’ˆ์˜ ์ƒํ˜ธ์ž‘์šฉ์€ latent vector์˜ ๋‚ด์ ์„ ํ†ตํ•ด ๊ตฌํ•ด์ง„๋‹ค. MF๋Š” CF์—์„œ ํšจ๊ณผ์ ์ด๊ธฐ๋Š” ํ•˜๋‚˜ simple choice of the interaction function์œผ๋กœ ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค.

๋‚ด์ (inner product)์€ ์„ ํ˜•์ (linearly)์œผ๋กœ latent vector๋ฅผ ๊ฒฐํ•ฉํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ณต์žกํ•œ ๊ตฌ์กฐ์˜ user interaction data์— ๋Œ€ํ•ด์„œ๋Š” ํšจ๊ณผ์ ์ด์ง€ ๋ชปํ•˜๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋Š” ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” deep neural networks(DNNs)๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

์ตœ๊ทผ DNNs๋ฅผ recommendation task์— ๋งŽ์ด ์ ์šฉํ•˜์˜€๊ณ , ์‹ ๋ขฐ์„ฑ์žˆ๋Š” ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์คฌ์ง€๋งŒ ๋Œ€๋ถ€๋ถ„ auxiliary information์— DNNs์„ ์‚ฌ์šฉํ–ˆ๋‹ค. ๊ฒฐ๊ตญ user์™€ item latent feature์„ ํ•ฉ์น˜๊ธฐ ์œ„ํ•ด ์—ฌ์ „ํžˆ MF๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์œ„์˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ implicit feedback์— ์ดˆ์ ์„ ๋‘์—ˆ๋‹ค.

โ“ Implicit feedback์ด๋ž€?
ํ–‰๋™์„ ํ†ตํ•ด ๊ฐ„์ ‘์ ์œผ๋กœ ์‚ฌ์šฉ์ž์˜ ์„ ํ˜ธ๋„๋ฅผ ๋ฐ˜์˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ๋น„๋””์˜ค ์‹œ์ฒญ
  • ์ƒํ’ˆ ๊ตฌ์ž…
  • ์ƒํ’ˆ์„ ํด๋ฆญ

explicit feedback๊ณผ ๋น„๊ตํ–ˆ์„ ๋•Œ implicit feedback์€ ์ž๋™์ ์œผ๋กœ ์ถ”์ ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋น„๊ต์  ์ˆ˜์ง‘ํ•˜๊ธฐ ์‰ฝ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์‚ฌ์šฉ์ž์˜ ๋งŒ์กฑ๋„๊ฐ€ ๊ด€์ฐฐ๋˜์ง€ ์•Š๊ณ , negative feedback์˜ natural scarcity๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉํ•จ์— ์žˆ์–ด ์–ด๋ ค์›€์ด ์žˆ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์˜ ์ฃผ์š” contribution์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

  • neural networks์— ๊ธฐ๋ฐ˜ํ•œ CF ๋ฐฉ๋ฒ• ์ œ์‹œ(NCF)
  • MF๋Š” NCF์˜ ํŠน๋ณ„ํ•œ ์ผ€์ด์Šค๊ฐ€ ๋จ์„ ์ฆ๋ช…
  • ๋‹ค์–‘ํ•œ ์‹คํ—˜์„ ํ†ตํ•œ NCF์˜ ํšจ์œจ์„ฑ ์ฆ๋ช…

2. Preliminaries

๋จผ์ € implicit feedback ๊ธฐ๋ฐ˜์˜ CF์— ๋‚ด์ ์„ ์ด์šฉํ•˜๋Š” MF๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ์€ ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค.

2.1 Learning from Implicit Data

user-item interaction matrix YโˆˆRMร—NY\in R^{M \times N} ๋กœ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ด๋•Œ matrix์˜ ์›์†Œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ userโ€™s implicit feedback์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

๐Ÿ‘‰๐Ÿป yuiy_{ui}์˜ 1 ๊ฐ’์€ user uu๊ณผ item ii ์‚ฌ์ด์˜ ์ƒํ˜ธ์ž‘์šฉ์ด ์žˆ์Œ์„ ๋œปํ•œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Š” uu๊ฐ€ ii์„ ์‹ค์ œ๋กœ ์ข‹์•„ํ•œ๋‹ค๋Š” ์˜๋ฏธ๋Š” ์•„๋‹ˆ๋‹ค.

์œ ์‚ฌํ•˜๊ฒŒ, yuiy_{ui}์˜ 0 ๊ฐ’์€ uu๊ฐ€ ii์„ ์‹ค์ œ๋กœ ์‹ซ์–ดํ•œ๋‹ค๋Š” ์˜๋ฏธ๊ฐ€ ์•„๋‹ˆ๋‹ค.
โ†’ ์‚ฌ์šฉ์ž๊ฐ€ ์ƒํ’ˆ์„ ๋ชจ๋ฅผ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค!

์ด๋ ‡๊ฒŒ ์‚ฌ์šฉ์ž์˜ ์„ ํ˜ธ๋„์— ๋Œ€ํ•œ noisy signals์ด ์ œ๊ณต๋˜๊ธฐ ๋•Œ๋ฌธ์— implict data๋ฅผ ํ•™์Šตํ•˜๋Š”๋ฐ ์–ด๋ ค์›€์ด ์žˆ๋‹ค.

implicit feedbaack ์ถ”์ฒœ๋ฌธ์ œ๋Š” yuiy_{ui}๊ฐ€ 1์ด ๋  ํ™•๋ฅ ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ๋กœ ๊ท€๊ฒฐ๋˜๋ฉฐ ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„๋  ์ˆ˜ ์žˆ๋‹ค.

Objective function

Objective function๋ž€ ํ•™์Šต์„ ํ†ตํ•ด ์ตœ์ ํ™”์‹œํ‚ค๋ ค๋Š” ํ•จ์ˆ˜์ด๋‹ค.

๋‘ ์ข…๋ฅ˜์˜ Objective function์ด ํ•ด๋‹น ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ํ”ํ•˜๊ฒŒ ์‚ฌ์šฉ๋œ๋‹ค.

1๏ธโƒฃ pointwise loss

์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™” ์‹œํ‚ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ–‰๋˜๋ฉฐ, ๋ณดํ†ต ํšŒ๊ท€๋ฌธ์ œ์—์„œ ๋งŽ์ด ์‚ฌ์šฉ๋œ๋‹ค.

PointwiseLoss=min12(y^u,iโˆ’yu,i)2Pointwise Loss = min \frac{1}{2} (\hat{y}_{u,i}-y_{u,i})^2

2๏ธโƒฃ pairwise loss

๊ด€์ฐฐ๋œ entries๊ฐ€ ๊ด€์ฐฐ๋˜์ง€ ์•Š์€ entries๋ณด๋‹ค ์ˆœ์œ„๊ฐ€ ๋†’์„ ๊ฒƒ์ด๋ผ๋Š” ์•„์ด๋””์–ด๋ฅผ ์ฐจ์šฉ

๊ด€์ธก๊ฐ’(y^u,i\hat{y}_{u,i})๊ณผ ๊ด€์ธก๋˜์ง€ ์•Š์€ ๊ฐ’(y^u,j\hat{y}_{u,j}) ์‚ฌ์ด์˜ ๋งˆ์ง„์€ ์ตœ๋Œ€ํ™” ์‹œํ‚ค๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ–‰๋œ๋‹ค.

PairwiseLoss=max(0,f(y^u,i)โˆ’f(y^u,j)+ฮฑ)PairwiseLoss=max(0,f(\hat{y}_{u,i}) - f(\hat{y}_{u,j}) + \alpha )
s.ts.t
rank(y^u,i)>rank(y^u,j)rank(\hat{y}_{u,i}) > rank(\hat{y}_{u,j})

๋ณธ ๋ˆˆ๋ฌธ์— ๋”ฐ๋ฅด๋ฉด NCF๋Š” interaction function ff์— neural networks๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒƒ์œผ๋กœ, ์ด ๊ฒฝ์šฐ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ pointwise loss์™€ pairwise loss ๋ชจ๋‘๋ฅผ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

2.2 Matrix Factorization

MF๋Š” pup_u์™€ qiq_i์˜ ๋‚ด์ ์„ ํ†ตํ•ด interaction yuiy_{ui}๋ฅผ ์˜ˆ์ธกํ•˜๋ฉฐ ์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

์ด ๋•Œ, pup_u์™€ qiq_i๋Š” ์‚ฌ์šฉ์ž uu์™€ ์ƒํ’ˆ ii์˜ ์ž ์žฌ ๋ฒกํ„ฐ๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

  • K : latent space์˜ ์ฐจ์›

๐Ÿ’ก MF์˜ ํŠน์ง•

  1. ์‚ฌ์šฉ์ž์™€ ์ƒํ’ˆ์˜ latent factors์˜ ์–‘๋ฐฉํ–ฅ interaction์„ ๋ชจ๋ธ๋งํ•œ๋‹ค.
  2. ๊ฐ๊ฐ ๋…๋ฆฝ์ ์ด๊ณ  ๊ฐ™์€ ๊ฐ€์ค‘์น˜๋กœ ์„ ํ˜• ๊ฒฐํ•ฉ์„ ํ•œ๋‹ค.
  3. latent factor์˜ linear model๋กœ ์—ฌ๊ฒจ์ง„๋‹ค.

MF์˜ ํ•œ๊ณ„

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ถ”์ฒœ ์‹œ์Šคํ…œ์—์„œ์˜ MF์˜ ํ•œ๊ณ„๋ฅผ ์ง€์ ํ•œ๋‹ค. MF๋Š” ์‚ฌ์šฉ์ž์™€ ์ƒํ’ˆ์„ ๋™์ผํ•œ latent space์— ๋งคํ•‘ํ•˜๊ณ  ์ด๋ฅผ ๋‚ด์ ์œผ๋กœ ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•œ๋‹ค.

1๏ธโƒฃ ex) (a) user-item matrix์—์„œ u1,u2,u3u_1,u_2,u_3์„ ์ด์šฉํ•ด์„œ ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•ด๋ณด์ž

โ–ถ๏ธŽย sijs_{ij}๋Š” user ii์™€ jj์˜ ์œ ์‚ฌ๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

s23(0.66)>s12(0.50)>s13(0.40)s_{23}(0.66)>s_{12}(0.50)>s_{13}(0.40)

์ด๋ฅผ (b)์™€ ๊ฐ™์ด p1,p2,p3p_1,p_2,p_3๋ฅผ ์ด์šฉํ•ด latent space์—์„œ ๊ธฐํ•˜ํ•™์ ์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค๋ฉด user 2์™€ user 3์ด ๊ฐ€์žฅ ๊ฐ€๊น๊ณ , user 1๊ณผ user 3์ด ๊ฐ€์žฅ ๊ฐ€๊นŒ์ด ์œ„์น˜ํ•จ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

2๏ธโƒฃ ex) ์˜ˆ์‹œ 1์— u4u_4์„ ์ถ”๊ฐ€ํ•ด ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•ด๋ณด์ž

โ–ถ๏ธŽย sijs_{ij}๋Š” user ii์™€ jj์˜ ์œ ์‚ฌ๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

s41(0.60)>s43(0.40)>s42(0.20)s_{41}(0.60)>s_{43}(0.40)>s_{42}(0.20)

์ด๋Š” user 4์™€ user 1์ด ๊ฐ€์žฅ ์œ ์‚ฌํ•˜๊ณ , ๋‹ค์Œ์œผ๋กœ user 3, user 2๊ฐ€ ์œ ์‚ฌํ•จ์„ ์˜๋ฏธํ•œ๋‹ค. ํ•˜์ง€๋งŒ ์ด๋ฅผ (b)์—์„œ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•œ p1p_1๊ณผ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šฐ๋ฉด์„œ p3p_3๋ณด๋‹ค p2p_2์— ๋” ๊ฐ€๊นŒ์šด p4p_4 ๋ฒกํ„ฐ๋ฅผ ์ฐพ์„ ์ˆ˜ ์—†๋‹ค.

์ด๋Š” MF๊ฐ€ ๋ณต์žกํ•œ user-item interaction์„ ์ €์ฐจ์›์˜ latent space๋กœ ํˆฌ์˜์‹œํ‚ค๋Š”๋ฐ ํ•œ๊ณ„๊ฐ€ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋•Œ๋ฌธ์— ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” DNNs์„ ์‚ฌ์šฉํ•œ interaction function์„ ํ•™์Šต์‹œํ‚ด์œผ๋กœ์จ ์ด ํ•œ๊ณ„๋ฅผ ๋‹ค๋ฃจ๊ณ ์ž ํ•œ๋‹ค.

3. Neural Collaborative Filtering

3.1 General Framework

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์•„๋ž˜์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด user-item interaction yuiy_{ui}๋ฅผ ๋ชจ๋ธํ™”ํ•˜๊ธฐ ์œ„ํ•œ multi-layer๋ฅผ ์ ์šฉํ–ˆ๋‹ค.

  • Input Layer

    • ๋‘๊ฐœ์˜ feature vectors vuUv^U_u, viIv^I_i๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋‹ค.(user uu์™€ item ii์„ ์„ค๋ช…)
    • ๋ณธ ๋…ผ๋ฌธ์€ ์ˆœ์ˆ˜ํ•œ collaborative filtering setting์— ์ดˆ์ ์„ ๋‘๊ธฐ ๋•Œ๋ฌธ์—, ์˜ค์ง user์™€ item์˜ id๋งŒ์„ ์‚ฌ์šฉํ•œ๋‹ค. ์ด ๋•Œ id๋Š” one-hot encoding์„ ํ†ตํ•ด binarized sparse vector๋กœ ๋ณ€ํ™˜๋œ๋‹ค.
  • Embedding Layer

    • fully connected layer : sparseํ•œ ๋ฒกํ„ฐ๋ฅผ denseํ•œ ๋ฒกํ„ฐ๋กœ ๋งคํ•‘ํ•œ๋‹ค.
  • Neural CF Layer

    • user embedding๊ณผ item embedding์„ input๊ฐ’์œผ๋กœ ๋ฐ›๋Š”๋‹ค.
    • latent vector๋ฅผ prediction score๋กœ ๋งคํ•‘ํ•œ๋‹ค.
    • ๊ฐ๊ฐ์˜ layer๋Š” user-item intersctions์˜ ํ™•์‹คํ•œ ์ž ์žฌ ๊ตฌ์กฐ๋ฅผ ๋ฐœํ˜„ํ•˜๊ธฐ ์œ„ํ•ด customized ๋  ์ˆ˜ ์žˆ๋‹ค.
    • hidden layer์˜ ๋งˆ์ง€๋ง‰ ์ธต์ธ X๋Š” ๋ชจ๋ธ์˜ ๋Šฅ๋ ฅ์„ ๊ฒฐ์ •ํ•œ๋‹ค.
  • Output Layer

    • score y^ui\hat{y}_{ui}๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค.

    • pointwise loss(y^ui\hat{y}_{ui}์™€ yuiy_{ui}์˜ ์ฐจ์ด)๋ฅผ ์ตœ์†Œํ™”ํ•œ๋‹ค.

    • NCF์˜ ์˜ˆ์ธก ๋ชจ๋ธ์„ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

      • PโˆˆRMร—KP \in R^{M \times K}, QโˆˆRNร—KQ \in R^{N \times K} : latent factor matrix for users and items

      • ff : multi-layer neural network

3.1.1 Learning NCF

๋ณธ ๋…ผ๋ฌธ์€ implicit data์˜ ์ด์ง„ ์†์„ฑ์— ํŠน๋ณ„ํ•œ ์ฃผ์˜๋ฅผ ๊ธฐ์šธ์ด๋Š” pointwise NCF๋ฅผ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•œ ํ™•๋ฅ ๋ก ์  ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.

  • ๋งŒ์•ฝ yuiy_{ui}๊ฐ€ 1์ด๋ผ๋ฉด item ii๊ฐ€ user uu์™€ ๊ด€๋ จ์ด ์žˆ์Œ์„ ์˜๋ฏธํ•˜๊ณ , 0์ด๋ผ๋ฉด ๋ฐ˜๋Œ€๋ฅผ ์˜๋ฏธํ•œ๋‹ค.
  • prediction score y^ui\hat{y}_{ui}๋Š” ์–ผ๋งˆ๋‚˜ ii์™€ uu๊ฐ€ ๊ด€๋ จ์ด ์žˆ๋Š”์ง€ ํ‘œํ˜„ํ•˜๋ฉฐ [0,1][0,1]์˜ ๋ฒ”์œ„๋ฅผ ๊ฐ€์ง„๋‹ค. ์ด๋ฅผ ๋„์ถœํ•˜๊ธฐ ์œ„ํ•ด Logistic ํ˜น์€ Probit function๊ฐ™์€ probabilistic function์„ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(ฯ•out\phi_{out})๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

โ–ถ๏ธŽ (6) likelihood function

โ–ถ๏ธŽ (7) negative logarithm of the likelihood

(7)์‹์„ objective function์œผ๋กœ ์‚ฌ์šฉํ•˜๋ฉฐ SGD๋ฅผ ์ด์šฉํ•ด LL ๊ฐ’์ด ์ตœ์†Œํ™” ๋˜๋Š” ์ตœ์ ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ฐพ๋Š”๋‹ค.

์ด ๋•Œ (7) ์‹์€ binary cross-entropy loss (log loss)์˜ ์ˆ˜์‹๊ณผ ๋™์ผํ•˜๋‹ค. ์ฆ‰,ย NCF ๊ฒฐ๊ณผ๊ฐ’์— ํ™•๋ฅ ์  ์ฒ˜๋ฆฌ๋ฅผ ์ง„ํ–‰ํ•จ์œผ๋กœ์จ implicit feedback recommandation problem์„ binary classification problem์œผ๋กœ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋‹ค.


3.2 Generalized Matrix Factorization (GMF)

๋ณธ ์ฑ•ํ„ฐ์—์„œ๋Š” ์–ด๋–ป๊ฒŒ MF๊ฐ€ NCF์˜ ํŠน๋ณ„ํ•œ ์ผ€์ด์Šค๊ฐ€ ๋˜๋Š”์ง€ ๋ณด์—ฌ์ค€๋‹ค.

  • mapping function of the first neural CF layer

    • latent vector pup_u be PTvuUP^Tv^U_u
    • latent vector qiq_i be QTviIQ^Tv^I_i
    • โ—‰๋Š” ๋ฒกํ„ฐ์˜ element-wise product์„ ๋œปํ•œ๋‹ค.
  • output layer

    • aouta_{out} : ํ™œ์„ฑํ™” ํ•จ์ˆ˜
    • hh : edge weights

๋งŒ์•ฝ aouta_{out}๊ฐ€ identity function์ด๊ณ  hh๊ฐ€ hT=[1,...,1]1ร—kh^T=[1,...,1]_{1 \times k}์˜ ํ˜•ํƒœ๋ผ๋ฉด ์™„์ „ํžˆ MF model์ด ๋œ๋‹ค.

๋งŒ์•ฝ aouta_{out}๊ฐ€ non-linear function์ด๋ผ๋ฉด MF๋ฅผ non-linear setting์œผ๋กœ ์ผ๋ฐ˜ํ™” ์‹œํ‚ฌ์ˆ˜ ์žˆ๋‹ค.

์ด๋•Œ, ๋งŒ์•ฝ aouta_{out}๊ฐ€ sigmoid function (ฯƒ(x)=1/1(+eโˆ’x)\sigma(x)=1/1(+e^{-x}))์ด๊ณ , hh๊ฐ€ hT=[h1,...,hk]1ร—kh^T=[h_1,...,h_k]_{1 \times k}์˜ ํ˜•ํƒœ๋ฅผ ๋„๊ณ  ์žˆ๋‹ค๋ฉด ์ด๋ฅผ GMF(MF๋ฅผ ์ผ๋ฐ˜ํ™”ํ•œ ๋ชจ๋ธ)์ด๋ผ๊ณ  ํ•œ๋‹ค. ์ด๋•Œ hh๋Š” ์œ„์˜ log loss๋กœ ๋ถ€ํ„ฐ ์–ป์–ด์˜จ data์ด๋‹ค.

3.3 Multi-Layer Perceptron(MLP)

NCF๋Š” input Layer๋กœ item๊ณผ user 2๊ฐ€์ง€ ๊ฒฝ๋กœ๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‘ ๊ฒฝ๋กœ๋ฅผ ์—ฐ๊ฒฐํ•˜๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•˜๋‹ค.

๋‹ค๋ฅธ ๋”ฅ๋Ÿฌ๋‹ ๋ฌธ์ œ์—์„œ์™€ ๋‹ฌ๋ฆฌ, item vector์™€ user vector๋ฅผ ๋‹จ์ˆœ์—ฐ๊ฒฐํ•˜๋Š” ๊ฒƒ์€ userโ€“item latent ์ƒํ˜ธ์ž‘์šฉ (latent ๊ตฌ์กฐ)์„ ์„ค๋ช…ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ CF๋ฅผ ๋ชจ๋ธ๋งํ•˜๊ธฐ์—” ๋ถˆ์ถฉ๋ถ„ํ•˜๋‹ค.

๋”ฐ๋ผ์„œย ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” vector๋ฅผ ํ•ฉ์น˜๋Š” ๊ตฌ๊ฐ„์— hidden layer๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์‚ฌ์šฉ์ž์™€ ์ƒํ’ˆ๊ฐ„์˜ latent feature์‚ฌ์ด์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ํ•™์Šตํ•˜๋Š” ํ‘œ์ค€ MLP ์‚ฌ์šฉ์„ ์ œ์•ˆํ•œ๋‹ค.

GMF๋Š” linearํ•˜๊ณ  fixedํ•œ ํŠน์ง•์œผ๋กœ ์ธํ•ด user ์™€ item๊ฐ„์˜ ๋ณต์žกํ•œ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ฐ˜๋ฉด MLP๋Š” non-linearํ•˜๊ณ  flexible ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ณด๋‹ค ๋ณต์žกํ•œ ๊ด€๊ณ„๋ฅผ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค. ์ด์— ๋”ฐ๋ฅด๋ฉด MLP๋Š” ์œ„์™€ ๊ฐ™์ด ํŒŒ๋ผ๋ฏธํ„ฐ๋ผ์ด์ง• ๋œ๋‹ค.

  • WxW_x : xx๋ฒˆ ์งธ ์ธต์˜ weight matrix
  • bxb_x : xx๋ฒˆ ์งธ ์ธต์˜ bias vector
  • axa_x : xx๋ฒˆ ์งธ ์ธต์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜
  • ฯ•1\phi_1 : 1๋ฒˆ neural CF layer(Layer 1) ํ•จ์ˆ˜ = user์™€ item์˜ latent vector๋ฅผ concatenationํ•˜๋Š” ํ•จ์ˆ˜
  • ฯ•x(x>=2)\phi_x(x>=2) : Neural Net ํ•จ์ˆ˜, weight matrix์™€ bias vector๋กœ ์ž ์žฌ ๊ตฌ์กฐ๋ฅผ ํ•™์Šต
  • ฯ•L\phi_L : non-linear ๊ตฌ์กฐ

3.4 Fusion of GMF MLP

๋ณธ ๋…ผ๋ฌธ์€ ์•„๋ž˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด GMF์™€ MLP๋ฅผ ํ†ตํ•ฉํ•œ ๋ชจ๋ธ(NeuMF)์„ ์ œ์‹œํ•œ๋‹ค. ์ด๋Š” ๊ฐ์ž๊ฐ€ ๊ฐ€์ง„ ์žฅ์ ์„ ์‚ด๋ฆด ์ˆ˜ ์žˆ๋Š” ๋ฐ˜๋ฉด ๋‹จ์ ์€ ์„œ๋กœ ๋ณด์™„ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ํ•œ๋‹ค.

๊ฐ€์žฅ ํฐ ํŠน์ง•์€ GMF์™€ MLP๊ฐ€ ๊ฐ™์€ embedding layer๋ฅผ ๊ณต์œ ํ•˜๊ณ  ๊ทธ๋“ค์˜ interaction function์˜ ๊ฒฐ๊ณผ๋ฅผ ํ•ฉ์นœ๋‹ค๋Š” ์ ์ด๋‹ค.

์œ„์˜ ์‹์€ one-layer MLP์™€ GMF๋ฅผ ํ•ฉ์นœ ๋ชจ๋ธ์ด๋‹ค.

fused model์—์„œ ๋” ์œ ๋™์„ฑ์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด GMF์™€ MLP๊ฐ€ separate embedding์„ ํ•™์Šตํ•˜๋„๋ก ํ–ˆ๊ณ , ๋งˆ์ง€๋ง‰ hidden layer์—์„œ concatenation์„ ํ•จ์œผ๋กœ์จ ๋‘ ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•œ๋‹ค.

๐Ÿ’ก ์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • puG,puMp^G_u,p^M_u๋Š” GMF์™€ MLP์—์„œ์˜ ์‚ฌ์šฉ์ž ์ž„๋ฒ ๋”ฉ์ด๋‹ค.
  • piG,piMp^G_i,p^M_i๋Š” GMF์™€ MLP์—์„œ์˜ ์ƒํ’ˆ ์ž„๋ฒ ๋”ฉ์ด๋‹ค.
  • y^u,i\hat{y}_{u,i}๋ฅผ ๊ตฌํ•  ๋•Œ๋Š”, ๊ฐ ๋ชจ๋ธ์—์„œ ๋‚˜์˜จ output์ธย ฯ•GMF\phi^{GMF}์™€ ฯ•MLP\phi^{MLP}๋ฅผ concatenationํ•˜์—ฌ hh๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์ค˜ ์ตœ์ข… ๊ฐ’์„ ๊ตฌํ•˜๊ฒŒ ๋œ๋‹ค.

3.4.1 Pre-training

GMF์™€ MLP์˜ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ NeuMF๋ฅผ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

  • GMF์™€ MLP ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๋ฌด์ž‘์œ„๋กœ ์ดˆ๊ธฐํ™”ํ•œ ์ƒํƒœ์—์„œ ํ•™์Šต์ด ์‹œ์ž‘๋˜๋ฉฐ ์ˆ˜๋ ดํ•  ๋•Œ๊นŒ์ง€ ์ง„ํ–‰ํ•œ๋‹ค. GMF์™€ MLP ์ตœ์ ํ™”์—๋Š” Adam(Adaptive Moment Estimation)์„ ์‚ฌ์šฉํ•œ๋‹ค.

  • ์‚ฌ์ „ ํ•™์Šต์œผ๋กœ ์–ป์€ ๋งค๊ฐœ ๋ณ€์ˆ˜๋กœ NeuMF ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.๊ฐ€์ค‘์น˜ ฮฑ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋‘ ๋ชจ๋ธ๋ฅผ ์—ฐ๊ฒฐํ•œ๋‹ค. NeuMF ์ตœ์ ํ™”์—๋Š” vanilla SGD๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค

profile
๋‚ด์ผ์˜ ๋‚˜๋Š” ์˜ค๋Š˜๋ณด๋‹ค ๋” ๋‚˜์•„์ง€๊ธฐ๋ฅผ :D

0๊ฐœ์˜ ๋Œ“๊ธ€