แ„‚ ๐Ÿ˜„ [14 ์ผ์ฐจ] : FUNDAMENTAL 15. ํฌ์ผ“๋ชฌ

๋ฐฑ๊ฑดยท2022๋…„ 1์›” 21์ผ
0

ํฌ์ผ“๋ชฌ

์—ฌ๊ธฐ์„œ๋Š” numpy, pandas๋ฅผ ํ™œ์šฉํ•  ์ค„ ์•Œ์•„์•ผ ํ•˜๊ณ 
pandas์˜ ๋ฌธ๋ฒ•๊ณผ ๋ฉ”์„œ๋“œ์— ๋Œ€ํ•ด ์•Œ์•„์•ผ ํ•˜๊ณ 
matplotib๋ฅผ ํ™œ์šฉํ•ด์„œ ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”๋ฅผ ํ•  ์ค„ ์•Œ๊ณ 
๋ฐ์ดํ„ฐ์…‹์„ train/test๋กœ ๋‚˜๋ˆ ์„œ ํ•™์Šต์ด๋ž‘ ๊ฒ€์ฆ์„ ํ•  ์ค„ ์•Œ์•„์•ผ ํ•ด

Exploratory Data Analysis : EDA(ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„)

  • ํฌ์ผ“๋ชฌ ๋ฐ์ดํ„ฐ์…‹
    • ํฌ์ผ“๋ชฌ์˜ ์ด๋ฆ„, ์†์„ฑ
    • ์Šคํƒฏ

๋ฐ์ดํ„ฐ ๊ตฌํ•˜๊ธฐ

  1. ๋ช‡๊ฐœ์˜ ํ”ผ์ณ๊ฐ€ ์žˆ๋Š”๊ฐ€

Import(๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ๊ฐ€์ ธ์˜ค๊ธฐ)

import numpy as np                                 # ํ–‰๋ ฌ
import pandas as pd                                # 
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
print('์™„๋ฃŒ')
์™„๋ฃŒ

๋ฐ์ดํ„ฐ์…‹์„ Pandas๋กœ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ.

import os
csv_path = "./pokemon_eda/data/Pokemon.csv"
original_data = pd.read_csv(csv_path)
print('์Š=3')
์Š=3
#####################๋ณ€๊ฒฝ####################
import pandas as ํ”ผ๋”” 

๋ฐ์ดํ„ฐํŒŒ์ผ๊ฒฝ๋กœ = "./pokemon_eda/data/Pokemon.csv"
๋ชจ์…”๋‘˜์ƒˆ๋กœ๋งŒ๋“คํ…Œ์ดํ„ฐํ”„๋ ˆ์ž„ = ํ”ผ๋””.read_csv(๋ฐ์ดํ„ฐํŒŒ์ผ๊ฒฝ๋กœ)
print('์™„๋ฃŒ')
์™„๋ฃŒ
pokemon = original_data.copy()
print(pokemon.shape)
pokemon.head()
(800, 13)
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ = ๋ชจ์…”๋‘˜์ƒˆ๋กœ๋งŒ๋“คํ…Œ์ดํ„ฐํ”„๋ ˆ์ž„.copy()
print(๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ.shape) # -> 800, 13 : 800ํ–‰ 13์—ด
๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ.head()
(800, 13)
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False

์ „์„ค์˜ ํฌ์ผ“๋ชฌ์ธ์ง€ ์•„๋‹Œ์ง€๋งŒ ํ™•์ธ

# ์ „์„ค์˜ ํฌ์ผ“๋ชฌ ๋ฐ์ดํ„ฐ์…‹
legendary = pokemon[pokemon["Legendary"] == True].reset_index(drop=True)
print(legendary.shape)
legendary.head()
(65, 13)
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 144 Articuno Ice Flying 580 90 85 100 95 125 85 1 True
1 145 Zapdos Electric Flying 580 90 90 85 125 90 100 1 True
2 146 Moltres Fire Flying 580 90 100 90 125 85 90 1 True
3 150 Mewtwo Psychic NaN 680 106 110 90 154 90 130 1 True
4 150 MewtwoMega Mewtwo X Psychic Fighting 780 106 190 100 154 100 130 1 True
# ์ „์„ค์˜ ํฌ์ผ“๋ชฌ ๋ฐ์ดํ„ฐ์…‹
๋ ˆ์ „๋”๋ฆฌ = ๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ[๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ["Legendary"] == True].reset_index(drop=True)
print(๋ ˆ์ „๋”๋ฆฌ.shape)
๋ ˆ์ „๋”๋ฆฌ.head()
(65, 13)
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 144 Articuno Ice Flying 580 90 85 100 95 125 85 1 True
1 145 Zapdos Electric Flying 580 90 90 85 125 90 100 1 True
2 146 Moltres Fire Flying 580 90 100 90 125 85 90 1 True
3 150 Mewtwo Psychic NaN 680 106 110 90 154 90 130 1 True
4 150 MewtwoMega Mewtwo X Psychic Fighting 780 106 190 100 154 100 130 1 True
# ์ผ๋ฐ˜ ํฌ์ผ“๋ชฌ ๋ฐ์ดํ„ฐ์…‹
ordinary = pokemon[pokemon["Legendary"] == False].reset_index(drop=True)
print(ordinary.shape)
ordinary.head()
(735, 13)
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False
# ์ผ๋ฐ˜ ํฌ์ผ“๋ชฌ ๋ฐ์ดํ„ฐ์…‹
์ผ๋ฐ˜๋”๋ฆฌ = ๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ[๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ["Legendary"] == False].reset_index(drop=True)
print(์ผ๋ฐ˜๋”๋ฆฌ.shape)
์ผ๋ฐ˜๋”๋ฆฌ.head()
(735, 13)
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
0 1 Bulbasaur Grass Poison 318 45 49 49 65 65 45 1 False
1 2 Ivysaur Grass Poison 405 60 62 63 80 80 60 1 False
2 3 Venusaur Grass Poison 525 80 82 83 100 100 80 1 False
3 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122 120 80 1 False
4 4 Charmander Fire NaN 309 39 52 43 60 50 65 1 False

๊ฒฐ์ธก์น˜ ํ™•์ธํ•˜๊ธฐ

pokemon.isnull().sum()
#               0
Name            0
Type 1          0
Type 2        386
Total           0
HP              0
Attack          0
Defense         0
Sp. Atk         0
Sp. Def         0
Speed           0
Generation      0
Legendary       0
dtype: int64
๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ.isnull().sum()
#               0
Name            0
Type 1          0
Type 2        386
Total           0
HP              0
Attack          0
Defense         0
Sp. Atk         0
Sp. Def         0
Speed           0
Generation      0
Legendary       0
dtype: int64

์ „์ฒด ์ปฌ๋Ÿผ ์ดํ•ดํ•˜๊ธฐ

print(len(pokemon.columns))
pokemon.columns
13





Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense',
       'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')
print(len(๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ.columns))
๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ.columns
13





Index(['#', 'Name', 'Type 1', 'Type 2', 'Total', 'HP', 'Attack', 'Defense',
       'Sp. Atk', 'Sp. Def', 'Speed', 'Generation', 'Legendary'],
      dtype='object')
  • # : ํฌ์ผ“๋ชฌ Id number. ์„ฑ๋ณ„์ด ๋‹ค๋ฅด์ง€๋งŒ ๊ฐ™์€ ํฌ์ผ“๋ชฌ์ธ ๊ฒฝ์šฐ ๋“ฑ์€ ๊ฐ™์€ #๊ฐ’์„ ๊ฐ€์ง„๋‹ค. int
  • Name : ํฌ์ผ“๋ชฌ ์ด๋ฆ„. ํฌ์ผ“๋ชฌ ๊ฐ๊ฐ์˜ ์ด๋ฆ„์œผ๋กœ ์ €์žฅ๋˜๊ณ , 800๊ฐœ์˜ ํฌ์ผ“๋ชฌ์˜ ์ด๋ฆ„ ๋ฐ์ดํ„ฐ๋Š” ๋ชจ๋‘ ๋‹ค๋ฅด๋‹ค. (unique) str
  • Type 1 : ์ฒซ ๋ฒˆ์งธ ์†์„ฑ. ์†์„ฑ์„ ํ•˜๋‚˜๋งŒ ๊ฐ€์ง€๋Š” ๊ฒฝ์šฐ Type 1์— ์ž…๋ ฅ๋œ๋‹ค. str
  • Type 2 : ๋‘ ๋ฒˆ์งธ ์†์„ฑ. ์†์„ฑ์„ ํ•˜๋‚˜๋งŒ ๊ฐ€์ง€๋Š” ํฌ์ผ“๋ชฌ์˜ ๊ฒฝ์šฐ Type 2๋Š” NaN(๊ฒฐ์ธก๊ฐ’)์„ ๊ฐ€์ง„๋‹ค. str
  • Total : ์ „์ฒด 6๊ฐ€์ง€ ์Šคํƒฏ์˜ ์ดํ•ฉ. int
  • HP : ํฌ์ผ“๋ชฌ์˜ ์ฒด๋ ฅ. int
  • Attack : ๋ฌผ๋ฆฌ ๊ณต๊ฒฉ๋ ฅ. (scratch, punch ๋“ฑ) int
  • Defense : ๋ฌผ๋ฆฌ ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ๋ฐฉ์–ด๋ ฅ. int
  • Sp. Atk : ํŠน์ˆ˜ ๊ณต๊ฒฉ๋ ฅ. (fire blast, bubble beam ๋“ฑ) int
  • Sp. Def : ํŠน์ˆ˜ ๊ณต๊ฒฉ์— ๋Œ€ํ•œ ๋ฐฉ์–ด๋ ฅ. int
  • Speed : ํฌ์ผ“๋ชฌ ๋งค์น˜์— ๋Œ€ํ•ด ์–ด๋–ค ํฌ์ผ“๋ชฌ์ด ๋จผ์ € ๊ณต๊ฒฉํ• ์ง€๋ฅผ ๊ฒฐ์ •. (๋” ๋†’์€ ํฌ์ผ“๋ชฌ์ด ๋จผ์ € ๊ณต๊ฒฉํ•œ๋‹ค) int
  • Generation : ํฌ์ผ“๋ชฌ์˜ ์„ธ๋Œ€. ํ˜„์žฌ ๋ฐ์ดํ„ฐ์—๋Š” 6์„ธ๋Œ€๊นŒ์ง€ ์žˆ๋‹ค. int
  • Legendary : ์ „์„ค์˜ ํฌ์ผ“๋ชฌ ์—ฌ๋ถ€. !! Target feature !! bool

# ์ปฌ๋Ÿผ : ID numbers

len(set(pokemon["#"]))
721
len(set(๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ["#"]))
721

800๋ณด๋‹ค ์ž‘์€ 721์„ ๊ฐ€์ง€๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์•„ ID๊ฐ€ ๊ฐ™์€ ๊ฒƒ์ด ์กด์žฌํ•จ์„ ์•Œ ์ˆ˜ ์žˆ์Œ

pokemon[pokemon["#"] == 6]
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1 False
7 6 CharizardMega Charizard X Fire Dragon 634 78 130 111 130 85 100 1 False
8 6 CharizardMega Charizard Y Fire Flying 634 78 104 78 159 115 100 1 False
# ๊ฐ™์€ ๊ฐ’์„ ๊ฐ€์ง€๋Š” ํฌ์ผ“๋ชฌ ํ™•์ธ
๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ[๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ["#"] == 6]
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1 False
7 6 CharizardMega Charizard X Fire Dragon 634 78 130 111 130 85 100 1 False
8 6 CharizardMega Charizard Y Fire Flying 634 78 104 78 159 115 100 1 False

Name ์ปฌ๋Ÿผ : ์ด๋ฆ„

  • pokemon["Name"]์„ ์ง‘ํ•ฉ(set)์œผ๋กœ ๋งŒ๋“ค์–ด ์ค€ ํ›„ ๊ธธ์ด(len)๋ฅผ ํ™•์ธํ•˜๋ฉด ์ค‘๋ณต์ด ์‚ฌ๋ผ์ง€๋ฉด์„œ ์œ ์ผํ•œ ์ด๋ฆ„ ๊ฐœ์ˆ˜ ํ™•์ธ ๊ฐ€๋Šฅ
len(set(pokemon["Name"]))
800
len(set(๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ["Name"]))
800

์œ ๋‹ˆํฌ ํ•˜๊ตฐ

Type 1 & Type 2 : ํฌ์ผ“๋ชฌ์˜ ์†์„ฑ

๋ฐ์ดํ„ฐ ์ง์ ‘ ๋ณด๊ธฐ

pokemon.loc[[6, 10]]
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1 False
10 8 Wartortle Water NaN 405 59 63 80 65 80 58 1 False
๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ.loc[[6, 10]]
# Name Type 1 Type 2 Total HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary
6 6 Charizard Fire Flying 534 78 84 78 109 85 100 1 False
10 8 Wartortle Water NaN 405 59 63 80 65 80 58 1 False
# ๊ฐ ์†์„ฑ์˜ ์ข…๋ฅ˜ ํ™•์ธ
len(list(set(pokemon["Type 1"]))), len(list(set(pokemon["Type 2"])))
(18, 19)
# ๊ฐ ์†์„ฑ์˜ ์ข…๋ฅ˜ ํ™•์ธ
len(list(set(๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ["Type 1"]))), len(list(set(๋ณ€์ˆ˜ใ…กํฌ์ผ“๋ชฌ["Type 2"])))
(18, 19)

ํƒ€์ž…2๊ฐ€ ํ•œ๊ฐ€์ง€ ๋” ๋งŽ์Œ, ๊ทธ๊ฒŒ ๋ญ˜๊นŒ?
์–ด๋–ป๊ฒŒ ํ™•์ธํ•˜์ง€?
๊ณตํ†ต์ ์ธ ๊ฑด ๋บด๊ณ  ๋‹ค๋ฅธ๊ฒƒ๋งŒ ์ถœ๋ ฅํ•˜์ž

ํŒŒ์ด์ฌ ์ฐจ์ง‘ํ•ฉ ํ•จ์ˆ˜(set difference)


https://www.w3schools.com/python/ref_set_difference.asp
  File "/var/folders/59/gjb3x8rx30s2cxwfl3zh2m040000gn/T/ipykernel_5431/1204803596.py", line 1
    https://www.w3schools.com/python/ref_set_difference.asp
           ^
SyntaxError: invalid syntax
profile
๋งˆ์ผ€ํŒ…์„ ์œ„ํ•œ ์ธ๊ณต์ง€๋Šฅ ์„ค๊ณ„์™€ ์Šคํƒ€ํŠธ์—… Log

0๊ฐœ์˜ ๋Œ“๊ธ€