작업 1유형 : 포켓몬 정보 데이터

SOOYEON·2022년 5월 24일

빅데이터분석기사

목록 보기

13/36

포켓몬 정보 데이터

Q1.

Legendary 컬럼은 전설포켓몬 유무를 나타낸다.전설포켓몬과 그렇지 않은 포켓몬들의 HP평균의 차이를 구하여라

# s
(df[df['Legendary'] ==True]['HP'].mean()) - (df[df['Legendary'] ==False]['HP'].mean())

# .values[1] - .values[0]
target = df.groupby('Legendary').mean()['HP']
result = target.values[1] -target.values[0]
print(result)

Q2.

Type 1은 주속성 Type 2 는 부속성을 나타낸다. 가장 많은 부속성 종류는 무엇인가?

df.value_counts('Type 2').index[0]

Q3.

가장 많은 Type 1 의 종의 평균 Attack 을 평균 Defense로 나눈값은?

# s
target = df[df['Type 1'] == df['Type 1'].value_counts().index[0]]
target['Attack'].mean() / target['Defense'].mean()

# 
Max = df['Type 1'].value_counts().index[0]

result = df[df['Type 1']== Max].Attack.mean() /df[df['Type 1']== Max].Defense.mean()
print(result)

Q4. ✅

포켓몬 세대(Generation) 중 가장많은 Legendary를 보유한 세대는 몇세대인가?

# Total
df[df['Legendary']==True].groupby('Generation')['Legendary'].count().sort_values(ascending=False).index[0]

# .value_counts()
result =df[df.Legendary==True].Generation.value_counts().index[0]
result

Q5. 🌟

‘HP’, ‘Attack’, ‘Defense’, ‘Sp. Atk’, ‘Sp. Def’, ‘Speed’ 간의 상관 계수중 가장 절댓값이 큰 두 변수와 그 값을 구하여라

# s
target = df[['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']].corr().unstack().reset_index().sort_values(by=0, ascending=False)
target[target[0] != 1].head(1).unstack().reset_index(drop=True) # .to_frame()

# .unstack().reset_index()
target = df[[ 'HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed']].corr().unstack().reset_index().rename(columns={0: "corr"})
result = target[target['corr']!=1].sort_values('corr',ascending=False).iloc[0]
print(result)

Q6. 🌟

각 Generation의 Attack으로 오름차순 정렬시 상위 3개 데이터들(18개)의 Attack의 전체 평균을 구하여라

df.sort_values(['Generation','Attack'],ascending=True).groupby('Generation')['Attack'].head(3).mean()

result =  df.sort_values(['Generation','Attack']).groupby('Generation').head(3).Attack.mean()
print(result)

Q7.

각 Generation의 Attack으로 내림차순 정렬시 상위 5개 데이터들(30개)의 Attack의 전체 평균을 구하여라

df.sort_values(['Generation','Attack'],ascending=False).groupby('Generation').head(5)['Attack'].mean()

Q8. ✅

가장 흔하게 발견되는 (Type1 , Type2) 의 쌍은 무엇인가?

df.value_counts(['Type 1', 'Type 2'], ascending=False).head(1) # index[0]

Q9. ✅

한번씩만 존재하는 (Type1 , Type2)의 쌍의 갯수는 몇개인가?

# s
target = df[['Type 1','Type 2']].value_counts().to_frame('cnt')
target[target['cnt']==1].count().values[0]

# len
target = df[['Type 1','Type 2']].value_counts()
result = len(target[target==1])
print(result)

Q10. 🌟

한번씩만 존재하는 (Type1 , Type2)의 쌍을 각 세대(Generation)은 각각 몇개씩 가지고 있는가?

target = df[['Type 1','Type 2']].value_counts()
target2 =target[target==1]

lst = []
for value in target2.reset_index().values:
    t1 = value[0]
    t2 = value[1]
    
    sp = df[(df['Type 1']==t1) & (df['Type 2']==t2)]
    lst.append(sp)

result = pd.concat(lst).reset_index(drop=True).Generation.value_counts().sort_index()
print(result)