๐Ÿ˜ข ์Šคํ„ฐ๋””๋…ธํŠธ (Machine Learning 9)

zoeยท2023๋…„ 5์›” 23์ผ
0

Boosting Algorithm

  • ์•™์ƒ๋ธ” ๊ธฐ๋ฒ• : Voting, Bagging, Boosting, ์Šคํƒœ๊น… ๋“ฑ์œผ๋กœ ๋‚˜๋ˆˆ๋‹ค. ๋ณดํŒ…๊ณผ ๋ฐฐ๊น…์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ํˆฌํ‘œ๋ฅผ ํ†ตํ•ด ์ตœ์ข… ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐฉ์‹. ๋ณดํŒ…๊ณผ ๋ฐฐ๊น…์˜ ์ฐจ์ด์ ์€ ๋ณดํŒ…์€ ๊ฐ๊ฐ ๋‹ค๋ฅธ ๋ถ„๋ฅ˜๊ธฐ, ๋ฐฐ๊น…์€ ๊ฐ™์€ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์‚ฌ์šฉ. ๋Œ€ํ‘œ์ ์ธ ๋ฐฐ๊น… ๋ฐฉ์‹์ด ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์ด๋‹ค

  • Boosting : ์—ฌ๋Ÿฌ ๊ฐœ์˜ (์•ฝํ•œ)๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•™์Šต์„ ํ•˜๋ฉด์„œ ์•ž์—์„œ ํ•™์Šตํ•œ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ์˜ˆ์ธก์ด ํ‹€๋ฆฐ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋‹ค์Œ ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ ๊ฐ€์ค‘์น˜๋ฅผ ์ธ๊ฐ€ํ•ด์„œ ํ•™์Šต์„ ์ด์–ด ์ง„ํ–‰ํ•˜๋Š” ๋ฐฉ์‹. ์˜ˆ์ธก ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚˜์„œ ์•™์ƒ๋ธ” ํ•™์Šต์„ ์ฃผ๋„ํ•˜๊ณ  ์žˆ๋‹ค โ†’ ๊ทธ๋ž˜๋””์–ธํŠธ๋ถ€์ŠคํŠธ, XGBoost, LightGBM ๋“ฑ

  • ๋ฐฐ๊น…๊ณผ ๋ถ€์ŠคํŒ…์˜ ์ฐจ์ด



  • Adaboost :
    Adaboost - STEP1) ์ˆœ์ฐจ์ ์œผ๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•ด์„œ ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ์–ป์Œ. AdaBoost๋Š” DecisionTree ๊ธฐ๋ฐ˜์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜

Adaboost - STEP2) Step1์—์„œ ํ‹€๋ฆฐ +์— ๊ฐ€์ค‘์น˜๋ฅผ ์ธ๊ฐ€ํ•˜๊ณ  ๋‹ค์‹œ ๊ฒฝ๊ณ„๋ฅผ ๊ฒฐ์ •

Adaboost - STEP3) ๋‹ค์‹œ ๋†“์นœ -์— ๊ฐ€์ค‘์น˜๋ฅผ ์ธ๊ฐ€ํ•˜๊ณ  ๋‹ค์‹œ ๊ฒฝ๊ณ„๋ฅผ ๊ฒฐ์ •

Adaboost - STEP4) ์•ž์„œ ๊ฒฐ์ •ํ•œ ๊ฒฝ๊ณ„๋“ค์„ ํ•ฉ์นจ



  • ๋ถ€์ŠคํŒ… ๊ธฐ๋ฒ•
    - GBM Gradient Boosting Machine : AdaBoost ๊ธฐ๋ฒ•๊ณผ ๋น„์Šทํ•˜์ง€๋งŒ, ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•  ๋•Œ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•(Gradient Descent)์„ ์‚ฌ์šฉ
    - XGBoost : GBM์—์„œ PC์˜ ํŒŒ์›Œ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ๊ธฐ๋ฒ•์— ์ฑ„ํƒ๋˜์–ด ๋น ๋ฅธ ์†๋„์™€ ํšจ์œจ์„ ๊ฐ€์ง
    - LightGBM : XGBoost๋ณด๋‹ค ๋น ๋ฅธ ์†๋„๋ฅผ ๊ฐ€์ง




wine ๋ฐ์ดํ„ฐ

# wine ๋ฐ์ดํ„ฐ
# ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ, ์ปฌ๋Ÿผ ์ƒ์„ฑ

import pandas as pd

wine_url = 'https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/wine.csv'

wine = pd.read_csv(wine_url, index_col=0)
wine.head()
wine['taste'] = [1 if grade > 5 else 0 for grade in wine['quality']]

X = wine.drop(['taste', 'quality'], axis=1)
y = wine['taste']
# ์ง์ ‘ StandardScaler ์ ์šฉ

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_sc = sc.fit_transform(X)
X_sc
# Scaler ์ ์šฉ ํ›„์— ๋ฐ์ดํ„ฐ ๋‚˜๋ˆ„๊ธฐ

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_sc, y, test_size=0.2, random_state=13)
# ๋ชจ๋“  ์ปฌ๋Ÿผ์˜ ํžˆ์Šคํ† ๊ทธ๋žจ ์กฐ์‚ฌ
# ์ž˜ ๋ถ„ํฌ๋˜์–ด ์žˆ๋Š” ์ปฌ๋Ÿผ์ด ์ข‹์„ ๋•Œ๊ฐ€ ๋งŽ๋‹ค
import matplotlib.pyplot as plt

%matplotlib inline

wine.hist(bins=10, figsize=(15, 10))
plt.show()
# quality๋ณ„ ๋‹ค๋ฅธ ํŠน์„ฑ์ด ์–ด๋–ค์ง€ ํ™•์ธ

column_names = ['fixed acidity', 'volatile acidity', 'citric acid',
                'citric acid', 'residual sugar','chlorides', 'free sulfur dioxide',
                'total sulfur dioxide', 'density', 'pH', 'sulphates', 'alcohol']

df_pivot_table = wine.pivot_table(column_names, ['quality'], aggfunc='median')
print(df_pivot_table)
# quality์— ๋Œ€ํ•œ ๋‚˜๋จธ์ง€ ํŠน์ง•๋“ค์˜ ์ƒ๊ด€๊ด€๊ณ„
# ์ƒ๊ด€๊ณ„์ˆ˜๋Š” ์ ˆ๋Œ€๊ฐ’์œผ๋กœ ๋ด์•ผ ํ•จ

corr_matrix = wine.corr()

print(corr_matrix['quality'].sort_values(ascending=False))
# taste ์ปฌ๋Ÿผ์˜ ๋ถ„ํฌ

import seaborn as sns

sns.countplot(x= wine['taste'], data=wine)
plt.show()
# ๋‹ค์–‘ํ•œ ๋ชจ๋ธ์„ ํ•œ๋ฒˆ์— ํ…Œ์ŠคํŠธ โ˜…โ˜…โ˜…

from sklearn.ensemble import (AdaBoostClassifier, GradientBoostingClassifier,
                              RandomForestClassifier)
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression

models = []
models.append(('RandomForestClassifier', RandomForestClassifier())) # ํŠœํ”Œํ˜•์œผ๋กœ ๋„ฃ๊ธฐ
models.append(('DecisionTreeClassifier', DecisionTreeClassifier()))
models.append(('AdaBoostClassifier', AdaBoostClassifier()))
models.append(('GradientBoostingClassifier', GradientBoostingClassifier()))
models.append(('LogisticRegression', LogisticRegression()))

models
# ๊ฒฐ๊ณผ ์ €์žฅ ์ž‘์—… โ˜…โ˜…โ˜…

from sklearn.model_selection import KFold, cross_val_score

results = []
names = []

for name, model in models:
    kfold = KFold(n_splits=5, random_state=13, shuffle = True) # shuffle = True : ๋‚˜๋ˆ„๊ธฐ ์ „์— ๋ฐ์ดํ„ฐ ์„ž๊ธฐ(?)
    cv_results = cross_val_score(model, X_train, y_train, cv=kfold, scoring='accuracy')
    
    results.append(cv_results)
    names.append(name)
    
    print(name, cv_results.mean(), cv_results.std())
results
# cross-validation ๊ฒฐ๊ณผ๋ฅผ ์ผ๋ชฉ์š”์—ฐํ•˜๊ฒŒ ํ™•์ธํ•˜๊ธฐ
# ์ง€๊ธˆ์€ randomForest๊ฐ€ ์œ ๋ฆฌํ•ด๋ณด์ธ๋‹ค


fig = plt.figure(figsize=(14, 8))
fig.suptitle('Algorithm Comparison')
ax = fig.add_subplot(111)
plt.boxplot(results)
ax.set_xticklabels(names)
plt.show()
# ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํ‰๊ฐ€ ๊ฒฐ๊ณผ

from sklearn.metrics import accuracy_score

for name, model in models:
    model.fit(X_train, y_train)
    pred = model.predict(X_test)
    print(name, accuracy_score(y_test, pred))




kNN

  • kNN : ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ์„ ๋•Œ, ๊ธฐ์กด ๋ฐ์ดํ„ฐ์˜ ๊ทธ๋ฃน ์ค‘ ์–ด๋–ค ๊ทธ๋ฃน์— ์†ํ•˜๋Š”์ง€๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ. k๋Š” ๋ช‡ ๋ฒˆ์งธ ๊ฐ€๊นŒ์šด ๋ฐ์ดํ„ฐ๊นŒ์ง€ ๋ณผ ๊ฒƒ์ธ๊ฐ€๋ฅผ ์ •ํ•˜๋Š” ์ˆ˜์น˜




  • ์œ ํด๋ฆฌ๋“œ ๊ธฐํ•˜ : ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ
  • ๋‹จ์œ„์— ๋”ฐ๋ผ ๋ฐ”๋€” ์ˆ˜๋„ ์žˆ๋‹ค โ†’ ํ‘œ์ค€ํ™” ํ•„์š”

  • ์žฅ๋‹จ์ 
    - ์‹ค์‹œ๊ฐ„ ์˜ˆ์ธก์„ ์œ„ํ•œ ํ•™์Šต์ด ํ•„์š”ํ•˜์ง€ ์•Š๋‹ค
    - ์†๋„๊ฐ€ ๋นจ๋ผ์ง„๋‹ค
    - ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ(๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ๊ฑฐ๋‚˜ ์ปฌ๋Ÿผ์˜ ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ)์—๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š๋‹ค


kNN ์‹ค์Šต - iris ๋ฐ์ดํ„ฐ

# iris ๋ฐ์ดํ„ฐ

from sklearn.datasets import load_iris

iris = load_iris()
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
                                                    test_size=0.2, random_state=13,
                                                    stratify=iris.target)
# kNN ํ•™์Šต

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
# accuracy

from sklearn.metrics import accuracy_score

pred = knn.predict(X_test)
print(accuracy_score(y_test, pred))
# ๊ฐ„๋‹จํ•œ ์„ฑ๊ณผ
# confusion_matrix : https://wikidocs.net/194464
# classification_report : https://wikidocs.net/193994
from sklearn.metrics import classification_report, confusion_matrix

print(confusion_matrix(y_test, pred))
print(classification_report(y_test, pred))




GBM - Gradient Boosting Machine

  • GBM : ๋ถ€์ŠคํŒ… ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์•ฝํ•œ ํ•™์Šต๊ธฐ(week learner)๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ํ•™์Šต - ์˜ˆ์ธกํ•˜๋ฉด์„œ ์ž˜๋ชป ์˜ˆ์ธกํ•œ ๋ฐ์ดํ„ฐ์— ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌํ•ด์„œ ์˜ค๋ฅ˜๋ฅผ ๊ฐœ์„ ํ•ด๊ฐ€๋Š” ๋ฐฉ์‹. GBM์€ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•  ๋•Œ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)์„ ์ด์šฉํ•˜๋Š” ๊ฒƒ์ด ํฐ ์ฐจ์ด์ด๋‹ค
# HAR ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ

import pandas as pd

url = 'https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/HAR_dataset/features.txt'

feature_name_df = pd.read_csv(url, sep='\s+', header=None,
                              names=['column_index', 'column_name'])
# sep='\s+' : ๊ธธ์ด๊ฐ€ ์ •ํ•ด์ง€์ง€ ์•Š์€ ๊ณต๋ฐฑ์ด ๊ตฌ๋ถ„์ž์ธ ๊ฒฝ์šฐ์—๋Š” \s+ ์ •๊ทœ์‹(regular expression) ๋ฌธ์ž์—ด์„ ์‚ฌ์šฉ
# ์ฐธ๊ณ  : https://datascienceschool.net/01%20python/04.02%20%EB%8D%B0%EC%9D%B4%ED%84%B0%20%EC%9E%85%EC%B6%9C%EB%A0%A5.html
# names= : column์ด๋ฆ„ ์„ค์ •

feature_name = feature_name_df.iloc[:, 1].values.tolist()
X_train_url = 'https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/HAR_dataset/train/X_train.txt'
X_test_url = 'https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/HAR_dataset/test/X_test.txt'

X_train = pd.read_csv(X_test_url, sep='\s+', header=None)
X_test = pd.read_csv(X_test_url, sep='\s+', header=None)
X_train.columns = feature_name
X_test.columns = feature_name


y_train_url = 'https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/HAR_dataset/train/y_train.txt'
y_test_url = 'https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/HAR_dataset/test/y_test.txt'

y_train = pd.read_csv(y_test_url, sep='\s+', header=None, names = ['action'])
y_test = pd.read_csv(y_test_url, sep='\s+', header=None, names=['action'])
# ํ•„์š” ๋ชจ๋“ˆ import

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score
import time
import warnings

warnings.filterwarnings('ignore')
# acc๊ฐ€ 1 %, ๊ณ„์‚ฐ์‹œ๊ฐ„ 388์ดˆ
# ์ผ๋ฐ˜์ ์œผ๋กœ GBM์ด ์„ฑ๋Šฅ ์ž์ฒด๋Š” ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ๋ณด๋‹ค ์ข‹๋‹ค๊ณ  ์•Œ๋ ค์ ธ ์žˆ๋‹ค
# sckit-learn์˜ GBM์€ ์†๋„๊ฐ€ ์•„์ฃผ ๋А๋ฆฐ ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ๋‹ค.


start_time = time.time()
gb_clf = GradientBoostingClassifier(random_state=13)
gb_clf.fit(X_train, y_train)
gb_pred = gb_clf.predict(X_test)

print('ACC : ', accuracy_score(y_test, gb_pred))
print('Fit item : ', time.time() - start_time)
# GridSearch
# ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆผ!! โ˜…โ˜…โ˜…โ˜…โ˜…

from sklearn.model_selection import GridSearchCV

params = {
    'n_estimators' : [100, 500],
    'learning_rate' : [0.05, 0.1]
}

start_time = time.time()
grid = GridSearchCV(gb_clf, param_grid=params, cv = 2, verbose=1, n_jobs=-1)
# cv : ๊ต์ฐจ๊ฒ€์ฆ์„ ์œ„ํ•œ fold ํšŸ์ˆ˜
#  verbose : ๋Œ์•„๊ฐ„ ํšŸ์ˆ˜, ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ ๋ฉ”์‹œ์ง€๋ฅผ ์ถœ๋ ฅ, verbose=0(default)๋ฉด ๋ฉ”์‹œ์ง€ ์ถœ๋ ฅ ์•ˆํ•จ, verbose=1์ด๋ฉด ๊ฐ„๋‹จํ•œ ๋ฉ”์‹œ์ง€ ์ถœ๋ ฅ, verbose=2์ด๋ฉด ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ณ„ ๋ฉ”์‹œ์ง€ ์ถœ๋ ฅ
# ์ถœ์ฒ˜ : https://www.inflearn.com/questions/62112/gridsearchcv%EC%97%90%EC%84%9C-verbose

grid.fit(X_train, y_train)
print('Fit time : ', time.time() - start_time)
# best ํŒŒ๋ผ๋ฏธํ„ฐ

grid.best_score_
grid.best_params_
# test ๋ฐ์ดํ„ฐ์—์„œ์˜ ์„ฑ๋Šฅ

accuracy_score(y_test, grid.best_estimator_.predict(X_test))




XGBoost

  • XGBoost : ํŠธ๋ฆฌ ๊ธฐ๋ฐ˜์˜ ์•™์ƒ๋ธ” ํ•™์Šต์—์„œ ๊ฐ€์žฅ ๊ฐ๊ด‘๋ฐ›๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜. GBM ๊ธฐ๋ฐ˜์˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ๋ฐ, GBM์˜ ๋А๋ฆฐ ์†๋„๋ฅผ ๋‹ค์–‘ํ•œ ๊ทœ์ œ๋ฅผ ํ†ตํ•ด ํ•ด๊ฒฐ. ํŠนํžˆ ๋ณ‘๋ ฌํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ์„ค๊ณ„๋จ. XGBoost๋Š” ๋ฐ˜๋ณต ์ˆ˜ํ–‰ ์‹œ๋งˆ๋‹ค ๋‚ด๋ถ€์ ์œผ๋กœ ํ•™์Šต๋ฐ์ดํ„ฐ์™€ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ๋ฅผ ๊ต์ฐจ๊ฒ€์ฆ์„ ์ˆ˜ํ–‰. ๊ต์ฐจ๊ฒ€์ฆ์„ ํ†ตํ•ด ์ตœ์ ํ™”๋˜๋ฉด ๋ฐ˜๋ณต์„ ์ค‘๋‹จํ•˜๋Š” ์กฐ๊ธฐ ์ค‘๋‹จ ๊ธฐ๋Šฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ

  • ์ฃผ์š” ํŒŒ๋ผ๋ฏธํ„ฐ
    - nthread : CPU์˜ ์‹คํ–‰ ์Šค๋ ˆ๋“œ ๊ฐœ์ˆ˜๋ฅผ ์กฐ์ •. ๋””ํดํŠธ๋Š” CPU์˜ ์ „์ฒด ์Šค๋ ˆ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ
    - eta : GBM ํ•™์Šต๋ฅ 
    - num_bosst_rounds : n_estimators์™€ ๊ฐ™์€ ํŒŒ๋ผ๋ฏธํ„ฐ
    - max_depth

# install
# pip install xgboost
# ์—๋Ÿฌ๊ฐ€ ๋‚  ๊ฒฝ์šฐ, conda install py-xgboost
# xgboost๋Š” ์„ค์น˜ํ•ด์•ผ ํ•จ

!pip install xgboost
# ์„ฑ๋Šฅ ํ™•์ธ

from xgboost import XGBClassifier
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
y_train = le.fit_transform(y_train) 
# ์„ค์ •ํ•œ ์ด์œ  : https://stackoverflow.com/questions/71996617/invalid-classes-inferred-from-unique-values-of-y-expected-0-1-2-3-4-5-got

start_time = time.time()
xgb = XGBClassifier(n_estimators = 400, learning_rate = 0.1, max_depth = 3)
xgb.fit(X_train.values, y_train)
print('Fit time : ', time.time()- start_time)
accuracy_score(y_test, xgb.predict(X_test.values))
# ์กฐ๊ธฐ ์ข…๋ฃŒ ์กฐ๊ฑด๊ณผ ๊ฒ€์ฆ ๋ฐ์ดํ„ฐ ์ง€์ • ๊ฐ€๋Šฅ

from xgboost import XGBClassifier

evals = [(X_test.values, y_test)]

start_time = time.time()
xgb = XGBClassifier(n_estimators = 400, learning_rate = 0.1, max_depth = 3)
xgb.fit(X_train.values, y_train, early_stopping_rounds=10, eval_set=evals)
print('Fit time : ', time.time() - start_time)
accuracy_score(y_test, xgb.predict(X_test.values))




LightGBM

  • LightGBM : XGBoost์™€ ํ•จ๊ป˜ ๋ถ€์ŠคํŒ… ๊ณ„์—ด์—์„œ ๊ฐ€์žฅ ๊ฐ๊ด‘๋ฐ›๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜. LGBM์˜ ํฐ ์žฅ์ ์€ ์†๋„. ๋‹จ, ์ ์€ ์ˆ˜์˜ ๋ฐ์ดํ„ฐ์—๋Š” ์–ด์šธ๋ฆฌ์ง€ ์•Š์Œ(์ผ๋ฐ˜์ ์œผ๋กœ 10000๊ฑด ์ด์ƒ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋‹ค๊ณ  ํ•œ๋‹ค). GPU ๋ฒ„์ „๋„ ์กด์žฌ
# install for mac User

# brew install lightgbm

# pip install light gbm
!pip install lightgbm
import numpy as np
from sklearn.preprocessing import LabelEncoder

# ๋ผ๋ฒจ ์ธ์ฝ”๋” ์ƒ์„ฑ
encoder = LabelEncoder()

# X_train๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉ ํ”ผํŒ…ํ•˜๊ณ  ๋ผ๋ฒจ์ˆซ์ž๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค
encoder.fit(X_train)
X_train_encoded = encoder.transform(X_train)

# X_test๋ฐ์ดํ„ฐ์—๋งŒ ์กด์žฌํ•˜๋Š” ์ƒˆ๋กœ ์ถœํ˜„ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์‹ ๊ทœ ํด๋ž˜์Šค๋กœ ์ถ”๊ฐ€ํ•œ๋‹ค (์ค‘์š”!!!)
for label in np.unique(X_test):
    if label not in encoder.classes_: # unseen label ๋ฐ์ดํ„ฐ์ธ ๊ฒฝ์šฐ( )
        encoder.classes_ = np.append(encoder.classes_, label) # ๋ฏธ์ฒ˜๋ฆฌ ์‹œ ValueError๋ฐœ์ƒ
X_test_encoded = encoder.transform(X_test)
from lightgbm import LGBMClassifier

start_time = time.time()
lgbm = LGBMClassifier(n_estimators=400)
lgbm.fit(X_train.values, y_train, early_stopping_rounds=100, eval_set=evals)
print('Fit time : ', time.time() - start_time)
![](https://velog.velcdn.com/images/tjdgml1735/post/300fc594-75ae-4208-ae21-58a2aa0f3139/image.png)

์–ด๋ ต..์‹คํ–‰์ด ์•ˆ๋จ;; ์ถ”ํ›„ ๋‹ค์‹œ ํ™•์ธ ํ•„์š” ใ… ใ… 

๐Ÿ’ป ์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์ทจ์—… ์Šค์ฟจ

profile
#๋ฐ์ดํ„ฐ๋ถ„์„ #ํผํฌ๋จผ์Šค๋งˆ์ผ€ํŒ… #๋ฐ์ดํ„ฐ #๋””์ง€ํ„ธ๋งˆ์ผ€ํŒ…

0๊ฐœ์˜ ๋Œ“๊ธ€