Different embeddings+ LDA + Jensen-Shannon distance 😊

jj·2021년 2월 26일

SS-hashtag-recommendation-project

목록 보기

4/15

LDA has many uses:

Understanding the different varieties topics in a corpus (obviously),
Getting a better insight into the type of documents in a corpus (whether they are about news, wikipedia articles, business documents)
Quantifying the most used / most important words in a corpus
document similarity and recommendation

An unsupervised generative model that assigns topic distributions to documents.

high level에서, 모델은 각각의 문서가 여러개의 토픽을 가지고 있다고 가정한다. 그래서 문서간에 토픽이 서로 겹칠 수 있다고 가정한다. →또한 토픽 간에 공유되는 동일한 단어가 있을 것이다
각 문서의 단어들은 문서의 토픽에 영향을 준다. 토픽이 세부적으로 정의될 필요는 없지만, "몇 개의 토픽"이 있는지는 사전에 정의되어야 한다.