[KaggleStudy] Spooky NLP and Topic Modelling tutorial

이하얀·2025년 1월 16일

🐰 Kaggle-Transcription-Study

목록 보기

8/8

Notebook

Kaggle
Spooky NLP and Topic Modelling tutorial
transcription
Spooky NLP and Topic Modelling tutorial.ipynb

scipy.misc.imread ➡️ imageio

SciPy 1.3.0 버전 이후로 제거되어 유사한 라이브러리로 대체

# from scipy.misc import imread
import imageio

LookupError

NLTK 라이브러리가 punkt_tab이라는 리소스를 찾기 위해 발생
- 표준 punkt 토크나이저 일부가 아니기 때문에 발생하는 문제

LookupError: 
**********************************************************************
  Resource punkt_tab not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt_tab')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt_tab/english/

  Searched in:
    - '/root/nltk_data'
    - '/usr/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************

해결

nltk.download('punkt_tab')

min_df=0

min_df는 0과 1 사이의 실수 or 1 이상의 정수여야 하는데, 코드에서는 0으로 설정하여 에러 발생

sentence = ["I love to eat Burgers", 
            "I love to eat Fries"]
vectorizer = CountVectorizer(min_df=1) # 1로 수정
sentence_transform = vectorizer.fit_transform(sentence)

get_feature_names() -> get_feature_names_out()

사이킷런에서 이름 변경됨

print("The features are:\n {}".format(vectorizer.get_feature_names_out()))
print("\nThe vectorized array looks like:\n {}".format(sentence_transform.toarray()))

이하얀

언젠가 내 코드로 세상에 기여할 수 있도록, Data Science&BE 개발 기록 노트☘️

이전 포스트

[KaggleStudy] Spooky NLP and Topic Modelling tutorial

🐰 Kaggle-Transcription-Study

Notebook

scipy.misc.imread ➡️ imageio

LookupError

min_df=0

get_feature_names() -> get_feature_names_out()

[KaggleStudy] Keras U-Net starter - LB 0.277

0개의 댓글