97일차 시작.... (KNN)

조동현·2022년 11월 25일

KNN 거리 측정 방법 KNN 실습 KNN이란?

[교육] Python ML

목록 보기

14/17

📊 KNN - K 최근접 이웃 알고리즘

📌 KNN이란?

정의
- 지도 학습
- 데이터로부터 거리가 가까운 K개의 다른 데이터의 레이블을 참조하여 분류한다.

📌 KNN 거리 측정 방법

거리 측정 방법
- 유클리디안 거리 계산법
- 벡터의 크기가 커지면 계산이 복잡해진다.
→ 성능 저하 원인

📊 KNN 실습

📌 KNN 실습

1. 라이브러리 Import

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

2. 데이터 준비

breast = load_breast_cancer()
feature = breast.data
label = breast.target

3. 학습, 테스트 데이터 분리
stratify 속성 : label의 unique value 데이터의 분포를 일정하게 나눌 수 있게 도와주는 속성

# stratify 속성 : label의 unique value 데이터의 분포가 일정하게 나눌 수 있게 도와주는 속성
x_train, x_test, y_train, y_test = train_test_split(feature, label, test_size=0.25, stratify=label, random_state=1)
print(x_train.shape, x_test.shape, y_train.shape, y_test.shape)
# (426, 30) (143, 30) (426,) (143,)

4. 모델 학습

model = KNeighborsClassifier(n_neighbors=5)
model.fit(x_train, y_train)

5. 예측값, 실제값 비교

y_pred = model.predict(x_test)
print('예측값 : ', y_pred[:10])
print('실제값 : ', y_test[:10])
# 예측값 :  [0 0 1 0 1 1 1 1 0 1]
# 실제값 :  [0 0 1 0 0 1 1 1 0 1]

6. 모델 성능 평가 - 정확도

acc = accuracy_score(y_test, y_pred)
print('모델 정확도 : ', acc)
# 모델 정확도 :  0.9300699300699301

조동현

데이터 사이언티스트를 목표로 하는 개발자

이전 포스트

96일차 시작.... (나이브베이즈)

다음 포스트

97일차 시작.... (KNN)

[교육] Python ML

📊 KNN - K 최근접 이웃 알고리즘

📌 KNN이란?

📌 KNN 거리 측정 방법

📊 KNN 실습

📌 KNN 실습

96일차 시작.... (나이브베이즈)

97일차 시작.... (Neural Network)

0개의 댓글