๐ŸผPandas

์ตœ์ง€์•ˆยท2023๋…„ 10์›” 6์ผ
0

dataframe ์ƒ์„ฑ

pd.DataFrame(โ€™dataโ€™, โ€˜indexโ€™, โ€˜columnsโ€™)

#ex)
df = pd.DataFrame(data=randn, index='A B C'.split(), columns='W X Y'.split())
WXY
A123
B456
C789
#ex)
df['W', 'Y']
A  1  3
B  4  6
C  7  9

์—ด ํ•ฉ์น˜๊ธฐ

#ex)
df['NEW'] = df['X'] + df['Y']
WXYNEW
A1235
B45611
C78917

df.drop(โ€™์ด๋ฆ„โ€™,axis = 0 or 1, inplace=True)

  • axis: 0 โ†’ ํ–‰, axis: 1 โ†’ ์—ด
  • inplace: ๋ณ€๊ฒฝ ๋‚ด์šฉ์„ ์ €์žฅํ•  ๊ฒƒ์ธ์ง€ ์—ฌ๋ถ€
#ex)
df.drop('NEW', axis=1, inplace=True)



data filtering

WXY
A123
B456
C789

ํŠน์ • ์กฐ๊ฑด์— ํ•ด๋‹นํ•˜๋“  ์—ด ํ•„ํ„ฐ๋ง

#ex)
df['W'] > 4

# A False
# B False
# C True
#ex)
df[df['W'] > 4]
WXY
C789
#ex)
df[df['W'] > 4]['Y']
Y
C9

๊ต์ง‘ํ•ฉ

#ex)
cond1 = df['W'] > 2
cond2 = df['Y'] > 8

df[(cond1) & (cond2)]
WXY
C789

index

#ex)
df.reset_index()
#df.set_index() ์ธ๋ฑ์Šค ์ด๋ฆ„ ์ง€์ • ๊ฐ€๋Šฅ
indexWXY
0A123
1B456
2C789



df ์ •๋ณด ๋ณด๊ธฐ

df.info()

  • data ํƒ€์ž…, item ๊ฐœ์ˆ˜, ์ด๋ฆ„, ์ €์žฅ๊ณต๊ฐ„ ๋“ฑ์„ ๋‚˜ํƒ€๋ƒ„

df.dtypes()

  • data์˜ ์ž๋ฃŒํ˜•์„ ๋‚˜ํƒ€๋ƒ„

df.describe()

  • df์— ๋Œ€ํ•œ ๋‹ค์ˆ˜์˜ ์ง‘๊ณ„ ๋ฉ”์†Œ๋“œ ๋ฐ˜ํ™˜
  • ๊ฐœ์ˆ˜, ํ‰๊ท , ๋ถ„์‚ฐ, ์‚ฌ๋ถ„์œ„



๊ฒฐ์ธก์น˜ ๋‹ค๋ฃจ๊ธฐ

ABC
01.05.01
12.0NaN2
2NaNNaN3

df.dropna(โ€axisโ€, โ€threshโ€)

  • ๊ฒฐ์ธก์น˜๋ฅผ ํฌํ•จํ•œ ํ–‰ ์ œ๊ฑฐ
ABC
01.05.01
  • axis = 0 โ†’ ํ–‰ ์ œ๊ฑฐ
  • axis = 1 โ†’ ์—ด ์ œ๊ฑฐ
  • thresh โ†’ ์ œ๊ฑฐํ•˜๋ ค๊ณ  ํ•˜๋Š” ๊ฒฐ์ธก์น˜์˜ ์ตœ์†Œ ๊ฐœ์ˆ˜
  • ex) df.dropna(thresh=2)
ABC
01.05.01
12.0NaN2

df.fillna(value = โ€˜ โ€™)

  • ๊ฒฐ์ธก์น˜๋ฅผ value ๊ฐ’์œผ๋กœ ์ฑ„์›€
  • df.fillna(df.mean()) ๊ณผ ๊ฐ™์€ ํ˜•์‹์œผ๋กœ ์ด์šฉ
  • df['A'].fillna(value=df['A'].,mean())



๊ธฐํƒ€ method

df.unique()

  • ๊ณ ์œณ๊ฐ’ ๋ฐฐ์—ด ์ถœ๋ ฅ

df.nunique()

  • ๊ณ ์œณ๊ฐ’์˜ ๊ฐœ์ˆ˜ ์ถœ๋ ฅ

df.value_counts()

  • ๊ฐ๊ฐ์˜ ๊ณ ์œณ๊ฐ’๊ณผ ๋ฐœ์ƒ ํšŸ์ˆ˜ ์ถœ๋ ฅ

0๊ฐœ์˜ ๋Œ“๊ธ€