[Python] [Pandas]

도도요닝·2022년 9월 1일

&&.quantile .reset_index(drop=True).select_dtypes .size().startswith .str.contains .unstack().value_counts().sort_index()drop_duplicates()isin

링크텍스트

제주 날씨,인구에 따른 교통량데이터 : 출처 제주 데이터 허브 DataUrl = ‘https://raw.githubusercontent.com/Datamanim/pandas/main/Jeju.csv’

url =‘https://raw.githubusercontent.com/Datamanim/pandas/main/Jeju.csv’
df = pd.read_csv(url, encoding='euc-kr')

수치형 변수를 가진 컬럼을 출력 .select_dtypes

범주형 변수를 가진 컬럼을 출력

#df.columns == 이렇게 시도했다.
ans = df.select_dtypes(exclude=object).columns
ans = df.select_dtypes(include = object).columns

평균 속도 컬럼의 4분위 범위(IQR) 값을 구하여라

ans = df['평균속도'].quantile(0.75) - df['평균속도'].quantile(0.25)

읍면동명 컬럼의 유일값 갯수를 출력하라 .nunique(), .unique()

ans = df.읍면동명.nunique()

url = 'https://raw.githubusercontent.com/Datamanim/pandas/main/chipo.csv'

quantity컬럼 값이 3인 데이터를 추출하여 index를 0부터 정렬하고 첫 5행을 출력하라
ans = df.loc[df['quantity']==3].head().reset_index(drop=True)
item_price 컬럼의 달러표시 문자를 제거하고 float 타입으로 저장하여 new_price 컬럼에 저장하라
df['new_price'] = df['item_price'].str[1:].astype('float')
ans = df['new_price'].head()
new_price 컬럼이 5이하의 값을 가지는 데이터프레임을 추출하고, 전체 갯수를 구하여라
ans = len(df.loc[df.new_price <=5])
item_name명이 Chicken Salad Bowl 인 데이터 프레임을 추출하라고 index 값을 초기화 하여라
ans = df.loc[df.item_name == 'Chicken Salad Bowl'].reset_index(drop=True)
new_price값이 9 이하이고 item_name 값이 Chicken Salad Bowl 인 데이터 프레임을 추출하라
ans = df.loc[(df.new_price <=9) & (df.item_name == 'Chicken Salad Bowl')]
df의 new_price 컬럼 값에 따라 오름차순으로 정리하고 index를 초기화 하여라
ans  = df.sotr_values('new_price).reset_index(drop=True)
df의 item_name 컬럼 값중 Chips 포함하는 경우의 데이터를 출력하라
ans = df.loc[df.item_name.str.contains('Chips')]

df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 데이터 프레임화 한 후, item_name를 기준으로 중복행이 있으면 제거하되 첫번째 케이스만 남겨라 drop_duplicates()

ans = df.loc[(df.item_name == 'Steak Salad') | (df.item_name == 'Bowl')]
ans = ans.drop_duplicates('item_name')
ans = ans.drop_duplicates('item_name', keep='last')

df의 데이터 중 item_name의 값이 Izze 데이터를 Fizzy Lizzy로 수정하라

df.loc[df.item_name == 'Izze','item_name'
ans =df
ans

df의 데이터 중 choice_description 값이 NaN 인 데이터를 NoData 값으로 대체하라(loc 이용) ~

df.loc[df.choice_description.isnull(),'choice_description'] = 'NoData'
ans = df

df의 데이터 중 choice_description 값에 Vegetables 들어가지 않는 경우의 갯수를 출력하라

ans = len(df.loc[~df.choice_description.str.contains('Vegetables')])
ans

df의 데이터 중 item_name 값이 N으로 시작하는 데이터를 모두 추출하라 .startswith()

ans = df[df.item_name.str.startswtih('N')]
ans

df의 데이터 중 item_name 값의 단어개수가 15개 이상인 데이터를 인덱싱하라

ans = df[df.item_name.str.len() >=15]
ans.head(3)

df의 데이터 중 new_price값이 lst에 해당하는 경우의 데이터 프레임을 구하고 그 갯수를 출력하라 lst =[1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98] isin

st1 = [1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98]
ans = df.loc[df.new_pric.isin(st1)]

display(ans.head(3))
prin(len(ans))

데이터의 각 host_name의 빈도수를 구하고 host_name으로 정렬하여 상위 5개를 출력하라 .size() .value_counts().sort_index()

ans = df.groupby('host_name').size()
ans = df.host_name.value_counts().sort_inex()

데이터의 각 host_name의 빈도수를 구하고 빈도수 기준 내림차순 정렬한 데이터 프레임을 만들어라. 빈도수 컬럼은 counts로 명명하라

Ans = df.groupby('host_name').size().\
                to_frame().rename(columns={0:'counts'}).\
                sort_values('counts',ascending=False)

neighbourhood_group의 값에 따른 neighbourhood컬럼 값의 갯수를 구하여라 크기 .size(), as_index = False

ans = df.groupby(['neighbourhood_group','neighbourhood'],as_index = False).size()

nighbourhood_group의 값에 따른 neighbourhood컬럼 값 중 neighbourhood_group그룹의 최댓값들을 출력하라 .size() 갯수

ans = df.groupby(['nighbourhood_group','neighbourhood'],as_index=False).size()\
		.groupby(['neighbourhood_group'], as_index=False).max()

neighbourhood 값과 neighbourhood_group 값에 따른 price 의 평균을 계층적 indexing 없이 구하라 .unstack()

ans = df.groupby(['neighbourhood','neighbourhood_group']).price.mean().unstack()

도도요닝

이전 포스트

scatterplot

다음 포스트

[Python] [Pandas]

수치형 변수를 가진 컬럼을 출력 .select_dtypes

범주형 변수를 가진 컬럼을 출력

평균 속도 컬럼의 4분위 범위(IQR) 값을 구하여라

읍면동명 컬럼의 유일값 갯수를 출력하라 .nunique(), .unique()

quantity컬럼 값이 3인 데이터를 추출하여 index를 0부터 정렬하고 첫 5행을 출력하라

item_price 컬럼의 달러표시 문자를 제거하고 float 타입으로 저장하여 new_price 컬럼에 저장하라

new_price 컬럼이 5이하의 값을 가지는 데이터프레임을 추출하고, 전체 갯수를 구하여라

item_name명이 Chicken Salad Bowl 인 데이터 프레임을 추출하라고 index 값을 초기화 하여라

new_price값이 9 이하이고 item_name 값이 Chicken Salad Bowl 인 데이터 프레임을 추출하라

df의 new_price 컬럼 값에 따라 오름차순으로 정리하고 index를 초기화 하여라

df의 item_name 컬럼 값중 Chips 포함하는 경우의 데이터를 출력하라

df의 item_name 컬럼 값이 Steak Salad 또는 Bowl 인 데이터를 데이터 프레임화 한 후, item_name를 기준으로 중복행이 있으면 제거하되 첫번째 케이스만 남겨라 drop_duplicates()

df의 데이터 중 item_name의 값이 Izze 데이터를 Fizzy Lizzy로 수정하라

df의 데이터 중 choice_description 값이 NaN 인 데이터를 NoData 값으로 대체하라(loc 이용) ~

df의 데이터 중 choice_description 값에 Vegetables 들어가지 않는 경우의 갯수를 출력하라

df의 데이터 중 item_name 값이 N으로 시작하는 데이터를 모두 추출하라 .startswith()

df의 데이터 중 item_name 값의 단어개수가 15개 이상인 데이터를 인덱싱하라

df의 데이터 중 new_price값이 lst에 해당하는 경우의 데이터 프레임을 구하고 그 갯수를 출력하라 lst =[1.69, 2.39, 3.39, 4.45, 9.25, 10.98, 11.75, 16.98] isin

데이터의 각 host_name의 빈도수를 구하고 host_name으로 정렬하여 상위 5개를 출력하라 .size() .value_counts().sort_index()

데이터의 각 host_name의 빈도수를 구하고 빈도수 기준 내림차순 정렬한 데이터 프레임을 만들어라. 빈도수 컬럼은 counts로 명명하라

neighbourhood_group의 값에 따른 neighbourhood컬럼 값의 갯수를 구하여라 크기 .size(), as_index = False

nighbourhood_group의 값에 따른 neighbourhood컬럼 값 중 neighbourhood_group그룹의 최댓값들을 출력하라 .size() 갯수

neighbourhood 값과 neighbourhood_group 값에 따른 price 의 평균을 계층적 indexing 없이 구하라 .unstack()

scatterplot

n211, 기준모델, 선형회귀모델, 다항선형회귀, 회귀평가지표

0개의 댓글