๐Ÿ˜ข ์Šคํ„ฐ๋””๋…ธํŠธ (Machine Learning 5)

zoeยท2023๋…„ 5์›” 17์ผ
0

Model Evaluation - ๋ชจ๋ธ ํ‰๊ฐ€ ๊ฐœ๋…

  • ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘/๊ฐ€๊ณต/๋ณ€ํ™˜ โ†’ ๋ชจ๋ธ ํ•™์Šต/์˜ˆ์ธก โ†’ ๋ชจ๋ธํ‰๊ฐ€ ๋ฐ˜๋ณต

  • ๋‹ค์–‘ํ•œ ๋ชจ๋ธ, ๋‹ค์–‘ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋‘๊ณ , ์ƒ๋Œ€์ ์œผ๋กœ ๋น„๊ตํ•œ๋‹ค.
    - ํšŒ๊ท€๋ชจ๋ธ๋“ค์€ ์‹ค์ œ ๊ฐ’๊ณผ ์—๋Ÿฌ์น˜๋ฅผ ๊ฐ€์ง€๊ณ  ๊ณ„์‚ฐ
    - ๋ถ„๋ฅ˜ ๋ชจ๋ธ์˜ ํ‰๊ฐ€ ํ•ญ๋ชฉ์ด ์กฐ๊ธˆ ๋งŽ๋‹ค (์ •ํ™•๋„, ์˜ค์ฐจํ–‰๋ ฌ, ์ •๋ฐ€๋„, ์žฌํ˜„์œจ, F1 score, ROC AUC ๋“ฑ๋“ฑ...)
    - ์ด์ง„ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์˜ ํ‰๊ฐ€
    - TP True Positive : ์‹ค์ œ Positive๋ฅผ Positive๋ผ๊ณ  ๋งž์ถ˜ ๊ฒฝ์šฐ
    - FN False Negative : ์‹ค์ œ Positive๋ฅผ Negative๋ผ๊ณ  ํ‹€๋ฆฌ๊ฒŒ ์˜ˆ์ธกํ•œ ๊ฒฝ์šฐ
    - TN True Negative : ์‹ค์ œ Negative๋ฅผ Negative๋ผ๊ณ  ๋งž์ถ˜ ๊ฒฝ์šฐ
    - FP False Positive : ์‹ค์ œ Negative๋ฅผ Positive๋ผ๊ณ  ํ‹€๋ฆฌ๊ฒŒ ์˜ˆ์ธกํ•œ ๊ฒฝ์šฐ

  • Accuracy : ์ „์ฒด ๋ฐ์ดํ„ฐ ์ค‘์— ๋งž๊ฒŒ ์˜ˆ์ธกํ•œ ๊ฒƒ์˜ ๋น„์œจ

  • Precision : ์ฐธ(์–‘์„ฑ)์ด๋ผ๊ณ  ์˜ˆ์ธกํ•œ ๊ฒƒ ์ค‘์—์„œ ์‹ค์ œ ์ฐธ(์–‘์„ฑ)์˜ ๋น„์œจ (precision = TP / (TP + FP)) (ex) ์ŠคํŒธ๋ฉ”์ผ์ด๋ผ๊ณ  ์˜ˆ์ธกํ–ˆ๋Š”๋ฐ ์ŠคํŒธ์ด ์•„๋‹Œ ๊ฒฝ์šฐ ๊ณค๋ž€ํ•˜๋‹ค)

  • โ˜… RECALL(TPR TRUE POSITIVE RATIO) : ์ฐธ์ธ ๋ฐ์ดํ„ฐ๋“ค ์ค‘์—์„œ ์ฐธ์ด๋ผ๊ณ  ์˜ˆ์ธกํ•œ ๊ฒƒ (recall = TP / (TP + FN)) โ˜…

  • FALL_OUT(FPR FALSE POSITIVE RATIO) : ์‹ค์ œ ์–‘์„ฑ์ด ์•„๋‹Œ๋ฐ, ์–‘์„ฑ์ด๋ผ๊ณ  ์ž˜๋ชป ์˜ˆ์ธกํ•œ ๊ฒฝ์šฐ



  • ๋ถ„๋ฅ˜๋ชจ๋ธ์€ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์†ํ•  ๋น„์œจ(ํ™•๋ฅ )์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
    - ์ด์ง„๋ถ„๋ฅ˜์—์„œ๋Š” ๋น„์œจ์—์„œ threshold๋ฅผ 0.5๋ผ๊ณ  ํ•˜๊ณ  0, 1๋กœ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜์˜ํ–ˆ๋‹ค.

  • โ˜… Recall๊ณผ Precision์€ ์„œ๋กœ ์˜ํ–ฅ์„ ์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ํ•œ ์ชฝ์„ ๊ทน๋‹จ์ ์œผ๋กœ ๋†’๊ฒŒ ์„ค์ •ํ•ด์„œ๋Š” ์•ˆ๋œ๋‹ค.



F1 - Score

- F1 - Score : Recall๊ณผ Precision์„ ๊ฒฐํ•ฉํ•œ ์ง€ํ‘œ. Recall๊ณผ Precision์ด ์–ด๋А ํ•œ์ชฝ์œผ๋กœ ์น˜์šฐ์น˜์ง€ ์•Š๊ณ  ๋‘˜ ๋‹ค ๋†’์€ ๊ฐ’์„ ๊ฐ€์งˆ ์ˆ˜๋ก ๋†’์€ ๊ฐ’์„ ๊ฐ€์ง





Model Evaluation - ROC์™€ AUC

  • ROC ๊ณก์„  :
    โ€ป FPR : FALL_OUT(FPR FALSE POSITIVE RATIO)
    โ€ป TPR : RECALL(TPR TRUE POSITIVE RATIO)




  • AUC : ROC ๊ณก์„ ์˜ ์•„๋ž˜์˜ ๋ฉด์ . ์ผ๋ฐ˜์ ์œผ๋กœ 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์ข‹์€ ์ˆ˜์น˜. ๊ธฐ์šธ๊ธฐ๊ฐ€ 1์ธ ์ง์„  ์•„๋ž˜์˜ ๋ฉด์ ์ด 0.5 โ†’ AUC๋Š” 0.5๋ณด๋‹ค ์ปค์•ผ ํ•จ





Model Evaluation - ROC ์ปค๋ธŒ ๊ทธ๋ ค๋ณด๊ธฐ

# ๋‹ค์‹œ ์™€์ธ ๋ง› ๋ถ„๋ฅ˜ํ•˜๋˜ ๋ฐ์ดํ„ฐ๋กœ\
# ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ

import pandas as pd

red_url = 'https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/winequality-red.csv'
white_url = 'https://raw.githubusercontent.com/PinkWink/ML_tutorial/master/dataset/winequality-white.csv'

red_wine = pd.read_csv(red_url, sep=';')
white_wine = pd.read_csv(white_url, sep=';')

red_wine['color'] = 1
white_wine['color'] = 0

wine = pd.concat([red_wine, white_wine])
wine['taste'] = [1 if grade > 5 else 0 for grade in wine['quality']]

X = wine.drop(['taste', 'quality'], axis=1)
Y = wine['taste']
# ๊ฐ„๋‹จํžˆ ๊ฒฐ์ •๋‚˜๋ฌด ์ ์šฉํ•˜๊ธฐ

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=13)

wine_tree = DecisionTreeClassifier(max_depth=2, random_state=13)
wine_tree.fit(X_train, Y_train)

y_pred_tr = wine_tree.predict(X_train)
y_pred_test = wine_tree.predict(X_test)

print('Train Acc : ', accuracy_score(Y_train, y_pred_tr))
print('Test Acc : ', accuracy_score(Y_test, y_pred_test))
# ๊ฐ ์ˆ˜์น˜ ๊ตฌํ•ด๋ณด๊ธฐ

from sklearn.metrics import (accuracy_score, precision_score, 
                             recall_score, f1_score, roc_auc_score, roc_curve)

print('Accuracy : ', accuracy_score(Y_test, y_pred_test))
print('Recall : ', recall_score(Y_test, y_pred_test))
print('Precision : ', precision_score(Y_test, y_pred_test))
print('AUC Score : ', roc_auc_score(Y_test, y_pred_test))
print('F1 Score : ', f1_score(Y_test, y_pred_test))
wine_tree.predict_proba(X_test) # 0์ผ ํ™•๋ฅ , 1์ผ ํ™•๋ฅ 
roc_curve(Y_test, pred_proba)
# ROC ์ปค๋ธŒ ๊ทธ๋ฆฌ๊ธฐ

import matplotlib.pyplot as plt
%matplotlib inline

pred_proba = wine_tree.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(Y_test, pred_proba)
fpr
tpr
thresholds
plt.figure(figsize=(10, 8))
plt.plot([0, 1], [0, 1], 'r', ls = 'dashed') # 0, 0 ~ 1, 1 ์‚ฌ์ด์˜ ๊ทธ๋ž˜ํ”„๊ฐ€ ๊ทธ๋ ค์ง, ๋ณด์กฐ์„  ์—ญํ• 
plt.plot(fpr, tpr)
plt.grid()
plt.show()




์ˆ˜ํ•™์ด ๊ธฐ์ดˆ - ํ•จ์ˆ˜

  • ๋‹คํ•ญํ•จ์ˆ˜

# ๋‹คํ•ญํ•จ์ˆ˜

import numpy as np
import matplotlib.pyplot as  plt

x = np.linspace(-3, 2, 100)
y = 3*x**2 + 2
x
y
plt.figure(figsize=(12, 8))
plt.plot(x, y)
plt.xlabel('$x$')
plt.ylabel('$y$')
plt.show()
# style

import matplotlib as mpl

mpl.style.use('seaborn-whitegrid')

plt.figure(figsize=(12, 8))
plt.plot(x, y)
plt.xlabel('$x$', fontsize = 25)
plt.ylabel('$y$', fontsize = 25)
plt.show()
# ๋‹คํ•ญํ•จ์ˆ˜์˜ x ์ถ• ๋ฐฉํ–ฅ ์ด๋™

x = np.linspace(-5, 5, 100)
y1 = 3*x**2 + 2
y2 = 3*(x+1)**2 + 2
# plot

plt.figure(figsize=(12, 8))
plt.plot(x, y1, lw = 2, ls = 'dashed', label = '$y=3x^2 + 2$')
plt.plot(x, y2, label = '$y=3(x+1)^2 + 2$')
plt.legend(fontsize = 15)
plt.xlabel('$x$', fontsize = 25)
plt.ylabel('$y$', fontsize = 25)
plt.show()


  • ์ง€์ˆ˜ํ•จ์ˆ˜

# ์ง€์ˆ˜ํ•จ์ˆ˜
# ์ง€์ˆ˜ํ•จ์ˆ˜๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•ด ๋‹ค์–‘ํ•œ ๊ฒฝ์šฐ

x = np.linspace(-2, 2, 100)
a11, a12, a13 = 2, 3, 4
y11, y12, y13 = a11**x, a12**x, a13**x

a21, a22, a23 = 1/2, 1/3, 1/4
y21, y22, y23 = a21**x,  a22**x, a23**x
# ๊ทธ๋ž˜ํ”„

fig, ax = plt.subplots(1, 2, figsize=(12, 6))

ax[0].plot(x, y11, color = 'k', label = r'$2^x$')
ax[0].plot(x, y12, '--', color = 'k', label = r'$3^x$')
ax[0].plot(x, y13, ':', color = 'k', label = r'$4^x$')

ax[0].legend(fontsize = 20)

ax[1].plot(x, y21, color = 'k', label = r'$(1/2)^x$')
ax[1].plot(x, y22, '--', color = 'k', label = r'$(1/3)^x$')
ax[1].plot(x, y23, ':', color = 'k', label = r'$(1/4)^x$')

ax[1].legend(fontsize = 20)
x = np.linspace(0, 10)

plt.figure(figsize = (6, 6))

plt.plot(x, x**2, '--', color = 'k', label = r'$x^2$')
plt.plot(x, 2**x, color = 'k', label = r'$2^x$')

plt.legend(loc = 'center left', fontsize = 25)
plt.xlabel('$x$', fontsize = 25)
plt.ylabel('$y$', fontsize = 25)

plt.show()


x = np.array([10, 100, 1000, 10000, 100000])
(1 + 1/x)**x




  • ๋กœ๊ทธํ•จ์ˆ˜

# ๋กœ๊ทธํ•จ์ˆ˜๋ฅผ ๊ทธ๋ฆฌ๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ

def log(x, base):
    return np.log(x)/np.log(base)

x1 = np.linspace(0.0001, 5, 1000)
x2 = np.linspace(0.01, 5, 100)

y11, y12 = log(x1, 10), log(x2, np.e)
y21, y22 = log(x1, 1/10), log(x2, 1/np.e)
# ๊ทธ๋ฆฌ๊ธฐ ์œ„ํ•œ ์ค€๋น„

fig, ax = plt.subplots(1, 2, figsize = (12, 6))

ax[0].plot(x1, y11, label = r'$log_{10} x$', color = 'k')
ax[0].plot(x2, y12, '--', label = r'$log_{e} x$', color = 'k')

ax[0].set_xlabel('$x$', fontsize = 25)
ax[0].set_ylabel('$y$', fontsize = 25)
ax[0].legend(fontsize = 20, loc = 'lower right')

ax[1].plot(x1, y21, label = r'$log_{10} x$', color = 'k')
ax[1].plot(x2, y22, '--', label = r'$log_{1/e} x$', color = 'k')

ax[1].set_xlabel('$x$', fontsize = 25)
ax[1].set_ylabel('$y$', fontsize = 25)
ax[1].legend(fontsize = 20, loc = 'upper right')

plt.show()


  • ์‹œ๊ทธ๋ชจ์ด๋“œ Sigmoid : 0 ~ 1์‚ฌ์ด์˜ ๊ฐ’์„ ๊ฐ€์ง

# ์‹œ๊ทธ๋ชจ์ด๋“œ

z = np.linspace(-10, 10, 100)
sigma = 1/(1+np.exp(-z))

plt.figure(figsize=(12, 8))
plt.plot(z, sigma)
plt.xlabel('$z$', fontsize = 25)
plt.ylabel('$\sigma(z)$', fontsize = 25)
plt.show()




์ˆ˜ํ•™์ด ๊ธฐ์ดˆ - ํ•จ์ˆ˜์˜ ํ‘œํ˜„

  • ๋ฒกํ„ฐ์˜ ํ‘œํ˜„

  • ๋‹จ์ผ ๋ณ€์ˆ˜ ์Šค์นผ๋ผ ํ‘œํ˜„

  • ๋‹ค์ค‘ ๋ณ€์ˆ˜ ์Šค์นผ๋ผ ํ•จ์ˆ˜

  • ๋‹ค๋ณ€์ˆ˜ ๋ฒกํ„ฐ ํ•จ์ˆ˜

๋‹ค๋ฉด์ˆ˜ ๋ฒกํ„ฐํ•จ์ˆ˜ ์˜ˆ์ œ)

# ๋‹ค๋ณ€์ˆ˜ ๋ฒกํ„ฐํ•จ์ˆ˜ ์˜ˆ์ œ

u = np.linspace(0, 1, 30)
v = np.linspace(0, 1, 30)

U, V = np.meshgrid(u, v)
X = U
Y = V
U, V = np.meshgrid(u, v)
U, V
Z = (1 + U**2) + (V/(1+V**2))
z
t = np.linspace(0, 4, 3)
p = np.linspace(0, 4, 3)
T, P = np.meshgrid(t, p)
T, P
# ๊ทธ๋ฆฌ๊ธฐ

fig = plt.figure(figsize=(7, 7))
ax = plt.axes(projection = '3d')
ax.xaxis.set_tick_params(labelsize = 15)
ax.yaxis.set_tick_params(labelsize = 15)
ax.zaxis.set_tick_params(labelsize = 15)
ax.set_xlabel(r'$x$', fontsize = 20)
ax.set_ylabel(r'$y$', fontsize = 20)
ax.set_xlabel(r'$z$', fontsize = 20)

ax.scatter3D(U, V, Z, marker = '.', color = 'gray')

plt.show()
# ํ•จ์ˆ˜ ํ•ฉ์„ฑ ์˜ˆ์ œ

x = np.linspace(-4, 4, 100)
y = x**3 - 15*x + 30
z = np.log(y)
# ํ•จ์ˆ˜ ์ƒ๊น€์ƒˆ

fig, ax = plt.subplots(1, 2, figsize = (12, 6))

ax[0].plot(x, y, label=r'$x^3 - 15x + 30$', color = 'k')
ax[0].legend(fontsize = 18)

ax[1].plot(y, z, label=r'$\log(y)$', color = 'k')
ax[1].legend(fontsize = 18)

plt.show()
# ํ•ฉ์„ฑํ•œ ๊ฒƒ ์ƒ๊น€์ƒˆ

fig, ax = plt.subplots(1, 2, figsize = (12, 6))

ax[0].plot(x, z,'--', label=r'$\log(f(x))$', color = 'k')
ax[0].legend(fontsize = 18)

ax[1].plot(x, y, label=r'$x^3 - 15x + 30$', color = 'k')
ax[1].legend(fontsize = 18)

ax_tmp = ax[1].twinx() # x์ถ•์„ ํ•˜๋‚˜ ๋” ๋งŒ๋“ค์–ด๋ผ
ax_tmp.plot(x, z, '--', label = r'$\log(f(x))$', color = 'k') # ๋งŒ๋“  x์ถ•์— ๋งž์ถฐ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค๋ผ

plt.show()




box plot

# ๊ฐ„๋‹จํ•œ ๋ฐ์ดํ„ฐ

sample = [1, 7, 9, 16, 36, 39, 45, 45, 46, 48, 51, 100, 101]
tmp_y = [1]*len(sample)
tmp_y
# plot

plt.figure(figsize=(12, 4))
plt.scatter(sample, tmp_y)
plt.grid()
plt.show()
# ์ง€ํ‘œ๋ฅผ ์ฐพ๋Š” ๋ฒ•

np.median(sample)
np.percentile(sample, 25)
np.percentile(sample, 75) - np.percentile(sample, 25)
iqr = np.percentile(sample, 75) - np.percentile(sample, 25)
iqr * 1.5
# ๊ทธ๋ฆฌ๊ธฐ

q1 = np.percentile(sample, 25)
q2 = np.median(sample)
q3 = np.percentile(sample, 75)
iqr = q3 - q1
upper_fence = q3 + iqr*1.5
lower_fence = q1 - iqr-1.5
plt.figure(figsize=(12, 4))
plt.scatter(sample, tmp_y)
plt.axvline(x=q1, color='black')
plt.axvline(x=q2, color='red')
plt.axvline(x=q3, color='black')
plt.axvline(x=upper_fence, color='black', ls = 'dashed')
plt.axvline(x=lower_fence, color='black', ls = 'dashed')
plt.grid()
plt.show()
# framework ์ด์šฉ

import seaborn as sns

plt.figure(figsize=(12, 4))
sns.boxenplot(sample)
plt.grid()
plt.show()

์–ด๋ ต..

๐Ÿ’ป ์ถœ์ฒ˜ : ์ œ๋กœ๋ฒ ์ด์Šค ๋ฐ์ดํ„ฐ ์ทจ์—… ์Šค์ฟจ

profile
#๋ฐ์ดํ„ฐ๋ถ„์„ #ํผํฌ๋จผ์Šค๋งˆ์ผ€ํŒ… #๋ฐ์ดํ„ฐ #๋””์ง€ํ„ธ๋งˆ์ผ€ํŒ…

0๊ฐœ์˜ ๋Œ“๊ธ€