Machine Learning_10

YJ·2023년 5월 25일

▷ 오늘 학습 계획: 머신러닝 강의(23~25)

📖 Chapter 10_추천시스템

콘텐츠 기반 필터링 추천 시스템
사용자가 특정 아이템을 선호하는 경우, 그 아이템과 비슷한 아이템을 추천

최근접 이웃 협업 필터링
측정된 사용자 행동 데이터를 기반으로 사용자가 아직 평가하지 않은 아이템을 예측 평가(사용자 기반 / 아이템 기반)

일반적으로 사용자 기반 보다는 아이템 기반 협업 필터링의 정확도가 더 높다.

잠재 요인 협업 필터링
사용자-아이템 평점 행렬 데이터를 이용하여 잠재요인 도출
주요인과 아이템에 대한 잠재요인에 대해 행렬 분해, 다시 행렬을 곱해서 아직 평점을 부여하지 않은 아이템에 대한 예측 평점을 생성한다.

콘텐츠 기반 필터링 실습

TMDB5000 Movie Dataset

literal_eval: 문자열 데이터 바꾸기
from ast import literal_eval
code = """(1,2,{'foo':'bar'})"""
literal_eval(code)  #(1, 2, {'foo': 'bar'})
# dataframe 특정 열의 문자열 데이터 바꾸기
df['column_name'].apply(literal_eval)
코사인 유사도 측정
from sklearn.metrics.pairwise import cosine_similarity
genre_sim = cosine_similarity(genre_mat, genre_mat)
영화 선정을 위한 가중치 선정

Good Books recommendations

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

books['authors']

tf = TfidfVectorizer(analyzer='word', ngram_range=(1,2),
                     min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(books['authors'])
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

tf1 = TfidfVectorizer(analyzer='word', ngram_range=(1,2),
                      min_df=0, stop_words='english')
tfidf_matrix1 = tf1.fit_transform(books_with_tags['tag_name'].head(10000))
cosine_sim1 = linear_kernel(tfidf_matrix1, tfidf_matrix1)

books + tag

tf_corpus = TfidfVectorizer(analyzer='word', ngram_range=(1,2),
                            min_df=0, stop_words='english')
tfidf_matrix_corpus = tf_corpus.fit_transform(books['corpus'])
cosine_sim_corpus = linear_kernel(tfidf_matrix_corpus, tfidf_matrix_corpus)

추천 함수

titles = books['title']
indices = pd.Series(books.index, index=books['title'])

def corpus_recommendations(title):
    idx = indices1[title]
    sim_scores = list(enumerate(cosine_sim_corpus[idx]))
    sim_scores = sorted(sim_scores, key=lambda x:x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    book_indices = [i[0] for i in sim_scores]
    return titles.iloc[book_indices]

▷ 내일 학습 계획: 머신러닝 과제

[이 글은 제로베이스 데이터 취업 스쿨의 강의 자료 일부를 발췌하여 작성되었습니다.]

이전 포스트

Machine Learning_9

다음 포스트

Machine Learning_10

📖 Chapter 10_추천시스템

콘텐츠 기반 필터링 실습

Good Books recommendations

[이 글은 제로베이스 데이터 취업 스쿨의 강의 자료 일부를 발췌하여 작성되었습니다.]

Machine Learning_9

EDA 테스트_4

0개의 댓글