Python Library - Pandas(6)

cheonbi·2022년 2월 14일

본 게시물은 코드프레소의 code.PRESS-UP 체험단 과정을 담은 게시물입니다.

해당 게시물 수강강좌 :

파이썬으로 배우는 데이터 분석 : Pandas

Pandas 라이브러리로 데이터 분석 시작하기

https://www.codepresso.kr/course/56

이번 포스팅에서는 데이터 조회를 위한 DataFrame 인덱싱 방법에 대해 알아보겠습니다.
Pandas를 활용하는 부분에 있어 코드프레소 강의가 많은 도움이 되고 있습니다.

set_index()

DataFrame은 기본적으로 index가 0부터 1씩 증가하는 정수로 생성됩니다.
다만 set_index를 이용하면 DataFrame의 index를 원하는대로 변경할 수 있습니다.

📌 특정 열의 데이터를 행의 인덱스 값으로 임의로 변경하는 함수
📌 특정 컬럼명을 지정하여 행의 인덱스 값으로 변경

#Syntax
set_index(keys = [k1, k2, ....],inplace = True/False, drop= True/False)

keys = [k1, k2, ...]

index로 설정할 list형태의 데이터
index는 보통 1줄이지만 2줄, 3줄 또는 그 이상이 될 수도 있다.
DataFrame의 행 개수와 동일한 list,Series 또는 DataFrame의 column name을 전달해야 한다.

inplace = True/False

True : set_index가 적용된 DataFrame 자체를 변경
False : set_index가 적용된 DataFrame은 원본 그대로 두고 다른 변수에 set_index가 적용된 DataFrame을 할당(False가 default 값)

drop = True/False

True : set_index의 key에 사용된 column을 index로 옮기고 column에서 삭제
False : set_index의 key에 사용된 column을 index로 옮기고 column에서 유지
(True가 default 값)

예시

import pandas as pd

dict_1 = {
    'col1': [1, 2, 3, 4, 5],
    'col2': [6, 7, 8, 9, 10],
    'col3': [11, 12, 13, 14, 15],
    'col4': [16, 17, 18, 19, 20]
}

df_1 = pd.DataFrame(dict_1)
print(df_1)

df_2 = df_1.set_index(keys=['col1'], inplace=False, drop=False)
print(df_2)

df_3 = df_1.set_index(keys=['col1'], inplace=False, drop=True)
print(df_3)



-- Result
   col1  col2  col3  col4
0     1     6    11    16
1     2     7    12    17
2     3     8    13    18
3     4     9    14    19
4     5    10    15    20

      col1  col2  col3  col4
col1                        
1        1     6    11    16
2        2     7    12    17
3        3     8    13    18
4        4     9    14    19
5        5    10    15    20

      col2  col3  col4
col1                  
1        6    11    16
2        7    12    17
3        8    13    18
4        9    14    19
5       10    15    20

위 예시를 보면
set_index의 keys에는 ['col1']이 명시되어 있습니다.
df_1에 존재하는 col1이라는 column의 data를 DataFrame의 index로 옮기겠다는 의미입니다.

inplace = False로 df_1 자체에는 변형을 주지않고 set_index를 적용한 후 df_2 또는 df_3에 할당합니다.

df_2와 df_3의 차이를 보면 drop이 True인지 False인지가 차이납니다.
df_2는 drop=False이므로 col1의 결과에서도 존재하지만
df_3은 drop=True이므로 index의 셋팅에 사용된 col1이 DataFrame에서 삭제된것을 볼 수 있습니다.

import pandas as pd

dict_1 = {
    'col1': [1, 2, 3, 4, 5],
    'col2': [6, 7, 8, 9, 10],
    'col3': [11, 12, 13, 14, 15],
    'col4': [16, 17, 18, 19, 20]
}

df_1 = pd.DataFrame(dict_1)
print(df_1)

df_4 = df_1.set_index(keys=['col1', 'col2'], inplace=False, drop=True)
print(df_4)



-- Result
   col1  col2  col3  col4
0     1     6    11    16
1     2     7    12    17
2     3     8    13    18
3     4     9    14    19
4     5    10    15    20

           col3  col4
col1 col2            
1    6       11    16
2    7       12    17
3    8       13    18
4    9       14    19
5    10      15    20

index는 반드시 1개일 필요는 없습니다.
위 예시처럼 index를 2줄 또는 그 이상으로 설정할 수도 있습니다.

import pandas as pd

dict_1 = {
    'col1': [1, 2, 3, 4, 5],
    'col2': [6, 7, 8, 9, 10],
    'col3': [11, 12, 13, 14, 15],
    'col4': [16, 17, 18, 19, 20]
}

df_1 = pd.DataFrame(dict_1)
print(df_1)

s = pd.Series(['a', 'b', 'c', 'd', 'e'])
df_5 = df_1.set_index(keys=[s], inplace=False)
print(df_5)

l = ['f', 'g', 'h', 'i', 'j']
df_6 = df_1.set_index(keys=[l], inplace=False)
print(df_6)



-- Result
   col1  col2  col3  col4
0     1     6    11    16
1     2     7    12    17
2     3     8    13    18
3     4     9    14    19
4     5    10    15    20

   col1  col2  col3  col4
a     1     6    11    16
b     2     7    12    17
c     3     8    13    18
d     4     9    14    19
e     5    10    15    20

   col1  col2  col3  col4
f     1     6    11    16
g     2     7    12    17
h     3     8    13    18
i     4     9    14    19
j     5    10    15    20