Python Built-in Types (1)

steadycode·2022년 11월 21일
0

Contents

  1. motivation
  2. binary types
  3. example
  4. conclusion

motivation

금융데이터 분석 중, 다음과 같은 결과를 관측했다.

# pseudo-code
def http_download_data():
	...
    return output

def main():
	output = download_data()
    print("Contents: ", output.content)
Contents:  b'\xc1\xf6\xbc\xf6\xb8\xed,\xc1\xbe\xb0\xa1,\xb4\xeb\xba\xf1,\xb5\xee\xb6\xf4\xb7\xfc,\xbd\xc3\xb0\xa1,\xb0\xed\xb0\xa1,\xc0\xfa\xb0\xa1,\xb0\xc5\xb7\xa1\xb7\xae,\xb0\xc5\xb7\xa1\xb4\xeb\xb1\xdd,\xbb\xf3\xc0\xe5\xbd\xc3\xb0\xa1\xc3\xd1\xbe\xd7\n"KRX 300","1454.34","6.74","0.47","1453.05","1454.99","1442.31","182231.0","8133165.0","1.712576991E9"\n"KTOP 30","8548.85","-23.87","-0.28","8600.17","8601.08","8485.86","35447.0","3104275.0","9.27943787E8"\n"KRX 100","5040.41","9.10","0.18","5049.82","5053.06","5003.20","81858.0","5616606.0","1.583854868E9"\n"KRX \xc0\xda\xb5\xbf\xc2\xf7","1748.48","-13.43","-0.76","1764.54","1765.11","1737.97","15849.0","454056.0","1.06128327E8"\n"KRX \xb9\xdd\xb5\xb5\xc3\xbc","2706.78","87.35","3.33","2628.03","2707.07","2623.06","22430.0","687298.0","1.02083412E8"\n"KRX \xc7\xef\xbd\xba\xc4\xc9\xbe\xee","2885.03","83.31","2.97","2812.22","2885.03","2810.09","60246.0","1412355.0","1.85000045E8"\n"KRX \xc0\xba\xc7\xe0","632.82","1.03","0.16","635.26","637.05","628.33","14730.0","362641.0","8.651098E7"\n"KRX \xbf\xa1\xb3\xca\xc1\xf6\xc8\xad\xc7\xd0","3234.57","-7.24","-0.22","3257.00","3267.77","3214.83","13254.0","999868.0","1.45313107E8"\n"KRX \xc3\xb6\xb0\xad","1653.70","6.97","0.42","1648.29","1659.24","1635.39","5493.0","238397.0","4.9190838E7"\n"KRX \xb9\xe6\xbc\xdb\xc5\xeb\xbd\xc5","735.56","0.97","0.13","735.89","739.05","734.20","3941.0","62912.0","2.7815278E7"\n"KRX \xb0\xc7\xbc\xb3","632.21","1.30","0.21","634.02","636.04","627.60","15037.0","340061.0","4.9402107E7"\n"KRX \xc1\xf5\xb1\xc7","587.22","3.81","0.65","584.78","588.07","581.33","4624.0","37030.0","2.2306987E7"\n"KRX \xb1\xe2\xb0\xe8\xc0\xe5\xba\xf1","618.09","0.91","0.15","618.99","621.59","613.28","48440.0","833392.0","2.17248906E8"\n"KRX \xba\xb8\xc7\xe8","1276.43","-20.92","-1.61","1298.73","1302.07","1271.85","4507.0","90146.0","3.7796138E7"\n"KRX \xbf\xee\xbc\xdb","983.94","12.45","1.28","972.93","984.07","964.12","11877.0","134598.0","4.0566327E7"\n"KRX \xb0\xe6\xb1\xe2\xbc\xd2\xba\xf1\xc0\xe7","1087.38","4.28","0.40","1082.66","1089.81","1080.20","10328.0","305367.0","6.9123474E7"\n"KRX \xc7\xca\xbc\xf6\xbc\xd2\xba\xf1\xc0\xe7","1304.91","23.37","1.82","1288.80","1305.64","1283.46","22622.0","692745.0","7.4782709E7"\n"KRX \xb9\xcc\xb5\xf0\xbe\xee&\xbf\xa3\xc5\xcd\xc5\xd7\xc0\xce\xb8\xd5\xc6\xae","2001.64","33.96","1.73","1979.48","2001.64","1962.11","12260.0","810897.0","1.18398903E8"\n"KRX \xc1\xa4\xba\xb8\xb1\xe2\xbc\xfa","1449.75","-2.64","-0.18","1452.03","1453.67","1434.59","39153.0","2430488.0","5.90694036E8"\n"KRX \xc0\xaf\xc6\xbf\xb8\xae\xc6\xbc","970.41","4.80","0.50","966.46","977.22","961.97","2615.0","61781.0","2.2020845E7"\n"KRX 300 \xc1\xa4\xba\xb8\xb1\xe2\xbc\xfa","2341.02","-8.53","-0.36","2348.37","2351.69","2317.93","33179.0","2312319.0","5.82753225E8"\n"KRX 300 \xb1\xdd\xc0\xb6","743.34","-1.42","-0.19","747.73","749.29","739.43","22172.0","497555.0","1.52122632E8"\n"KRX 300 \xc0\xda\xc0\xaf\xbc\xd2\xba\xf1\xc0\xe7","1355.01","-5.08","-0.37","1361.93","1363.28","1347.68","13563.0","675036.0","1.64295306E8"\n"KRX 300 \xbb\xea\xbe\xf7\xc0\xe7","622.35","2.80","0.45","621.82","624.26","619.05","38961.0","1012363.0","2.10437324E8"\n"KRX 300 \xc7\xef\xbd\xba\xc4\xc9\xbe\xee","2533.76","73.28","2.98","2472.39","2533.76","2470.46","43355.0","1156176.0","1.6620209E8"\n"KRX 300 \xc4\xbf\xb9\xc2\xb4\xcf\xc4\xc9\xc0\xcc\xbc\xc7\xbc\xad\xba\xf1\xbd\xba","1553.64","18.56","1.21","1542.92","1553.64","1529.51","12930.0","829934.0","1.41175436E8"\n"KRX 300 \xbc\xd2\xc0\xe7","1640.78","-2.55","-0.16","1653.65","1658.93","1629.53","10818.0","1158617.0","1.78192402E8"\n"KRX 300 \xc7\xca\xbc\xf6\xbc\xd2\xba\xf1\xc0\xe7","1283.40","20.09","1.59","1271.04","1284.13","1265.35","4861.0","343935.0","6.8816533E7"'

이를 통해 HTTP response는 b'...' 형태의 데이터로 표현되었음을 알 수 있다. 본 글에서는 python에서 제공하는 기본 built-in 타입 중 binary type에 대해 다룬다.


binary types

python 공식 문서에 따르면 byte 를 표현하기 위한 오브젝트는 총 3개가 존재한다. 다음과 같다.

  • bytes
  • byte array
  • memoryview

byte object

definition

byte object는 수정불가한 단일 바이트의 sequence 이다. 많은 주요 binary protocol이 ASCII에 기반한 text encoding 에 기반을 두고있기 때문에, byte object 는 ASCII compatible 한 여러 method 를 제공하며, string object 와 많은 연관성이 존재한다.

class bytes([source[, encoding[, errors]]])

representation

먼저 byte object는 접두사에 b가 포함된 것을 제외하고는 string 과 대체로 동일하다. 다음과 같이 byte object를 표현할 수 있다.

  • Single quotes: b'still allows embedded "double" quotes'

  • Double quotes: b"still allows embedded 'single' quotes"

  • Triple quoted: b'''3 single quotes''', b"""3 double quotes"""

위 구문의 특징은 오직 ASCII character 만 포함이 가능하다는 것이다. ASCII encoding의 bit를 넘어가는 127 이상의 값들은 escape sequence \ 를 사용하여 표기해야 한다. ASCII 와 관련해서는 따로 글을 포스팅 할 예정이다.

byte array object

definition

byte array object 는 대체로 byte object와 동일하지만, 수정가능한 특성을 가지고 있다.

class bytearray([source[, encoding[, errors]]])

representation

byte object와 동일하게 표현이 가능하다.

  • Single quotes: b'still allows embedded "double" quotes'

  • Double quotes: b"still allows embedded 'single' quotes"

  • Triple quoted: b'''3 single quotes''', b"""3 double quotes"""


example

그렇다면 이러한 byte 타입의 데이터를 활용하여 금융 데이터를 분석해보자. byte object가 ASCII 로 인코딩되었음을 알았으니 간단하게 decode() 함수를 사용해보았다.

decode with utf-8 (error)

# pseudo-code
def http_download_data():
	...
    return output

def main():
	output = download_data()
    decoded = output.content.decode()
    print(decoded)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 0: invalid start byte

그런데 위와 같은 decode error 가 발생했다. 아무래도 unicode (utf-8) 을 사용하면 디코딩에 문제가 되는 것 같았다.

decode with euc-kr

pseudo-code

def main():
	output = download_data()
    decoded = output.content.decode(encoding="euc-kr")
    print(decoded)

output

지수명,종가,대비,등락률,시가,고가,저가,거래량,거래대금,상장시가총액
"KRX 300","1454.34","6.74","0.47","1453.05","1454.99","1442.31","182231.0","8133165.0","1.712576991E9"
"KTOP 30","8548.85","-23.87","-0.28","8600.17","8601.08","8485.86","35447.0","3104275.0","9.27943787E8"
"KRX 100","5040.41","9.10","0.18","5049.82","5053.06","5003.20","81858.0","5616606.0","1.583854868E9"
"KRX 자동차","1748.48","-13.43","-0.76","1764.54","1765.11","1737.97","15849.0","454056.0","1.06128327E8"
"KRX 반도체","2706.78","87.35","3.33","2628.03","2707.07","2623.06","22430.0","687298.0","1.02083412E8"
"KRX 헬스케어","2885.03","83.31","2.97","2812.22","2885.03","2810.09","60246.0","1412355.0","1.85000045E8"
"KRX 은행","632.82","1.03","0.16","635.26","637.05","628.33","14730.0","362641.0","8.651098E7"
"KRX 에너지화학","3234.57","-7.24","-0.22","3257.00","3267.77","3214.83","13254.0","999868.0","1.45313107E8"
"KRX 철강","1653.70","6.97","0.42","1648.29","1659.24","1635.39","5493.0","238397.0","4.9190838E7"
"KRX 방송통신","735.56","0.97","0.13","735.89","739.05","734.20","3941.0","62912.0","2.7815278E7"
"KRX 건설","632.21","1.30","0.21","634.02","636.04","627.60","15037.0","340061.0","4.9402107E7"
"KRX 증권","587.22","3.81","0.65","584.78","588.07","581.33","4624.0","37030.0","2.2306987E7"
"KRX 기계장비","618.09","0.91","0.15","618.99","621.59","613.28","48440.0","833392.0","2.17248906E8"
"KRX 보험","1276.43","-20.92","-1.61","1298.73","1302.07","1271.85","4507.0","90146.0","3.7796138E7"
"KRX 운송","983.94","12.45","1.28","972.93","984.07","964.12","11877.0","134598.0","4.0566327E7"
"KRX 경기소비재","1087.38","4.28","0.40","1082.66","1089.81","1080.20","10328.0","305367.0","6.9123474E7"
"KRX 필수소비재","1304.91","23.37","1.82","1288.80","1305.64","1283.46","22622.0","692745.0","7.4782709E7"
"KRX 미디어&엔터테인먼트","2001.64","33.96","1.73","1979.48","2001.64","1962.11","12260.0","810897.0","1.18398903E8"
"KRX 정보기술","1449.75","-2.64","-0.18","1452.03","1453.67","1434.59","39153.0","2430488.0","5.90694036E8"
"KRX 유틸리티","970.41","4.80","0.50","966.46","977.22","961.97","2615.0","61781.0","2.2020845E7"
"KRX 300 정보기술","2341.02","-8.53","-0.36","2348.37","2351.69","2317.93","33179.0","2312319.0","5.82753225E8"
"KRX 300 금융","743.34","-1.42","-0.19","747.73","749.29","739.43","22172.0","497555.0","1.52122632E8"
"KRX 300 자유소비재","1355.01","-5.08","-0.37","1361.93","1363.28","1347.68","13563.0","675036.0","1.64295306E8"
"KRX 300 산업재","622.35","2.80","0.45","621.82","624.26","619.05","38961.0","1012363.0","2.10437324E8"
"KRX 300 헬스케어","2533.76","73.28","2.98","2472.39","2533.76","2470.46","43355.0","1156176.0","1.6620209E8"
"KRX 300 커뮤니케이션서비스","1553.64","18.56","1.21","1542.92","1553.64","1529.51","12930.0","829934.0","1.41175436E8"
"KRX 300 소재","1640.78","-2.55","-0.16","1653.65","1658.93","1629.53","10818.0","1158617.0","1.78192402E8"
"KRX 300 필수소비재","1283.40","20.09","1.59","1271.04","1284.13","1265.35","4861.0","343935.0","6.8816533E7"

raw data

Contents:  b'\xc1\xf6\xbc\xf6\xb8\xed,\xc1\xbe\xb0\xa1,\xb4\xeb\xba\xf1,\xb5\xee\xb6\xf4\xb7\xfc,\xbd\xc3\xb0\xa1,\xb0\xed\xb0\xa1,\xc0\xfa\xb0\xa1,\xb0\xc5\xb7\xa1\xb7\xae,\xb0\xc5\xb7\xa1\xb4\xeb\xb1\xdd,\xbb\xf3\xc0\xe5\xbd\xc3\xb0\xa1\xc3\xd1\xbe\xd7\n"KRX 300","1454.34","6.74","0.47","1453.05","1454.99","1442.31","182231.0","8133165.0","1.712576991E9"\n"KTOP 30","8548.85","-23.87","-0.28","8600.17","8601.08","8485.86","35447.0","3104275.0","9.27943787E8"\n"KRX 100","5040.41","9.10","0.18","5049.82","5053.06","5003.20","81858.0","5616606.0","1.583854868E9"\n"KRX \xc0\xda\xb5\xbf\xc2\xf7","1748.48","-13.43","-0.76","1764.54","1765.11","1737.97","15849.0","454056.0","1.06128327E8"\n"KRX \xb9\xdd\xb5\xb5\xc3\xbc","2706.78","87.35","3.33","2628.03","2707.07","2623.06","22430.0","687298.0","1.02083412E8"\n"KRX \xc7\xef\xbd\xba\xc4\xc9\xbe\xee","2885.03","83.31","2.97","2812.22","2885.03","2810.09","60246.0","1412355.0","1.85000045E8"\n"KRX \xc0\xba\xc7\xe0","632.82","1.03","0.16","635.26","637.05","628.33","14730.0","362641.0","8.651098E7"\n"KRX \xbf\xa1\xb3\xca\xc1\xf6\xc8\xad\xc7\xd0","3234.57","-7.24","-0.22","3257.00","3267.77","3214.83","13254.0","999868.0","1.45313107E8"\n"KRX \xc3\xb6\xb0\xad","1653.70","6.97","0.42","1648.29","1659.24","1635.39","5493.0","238397.0","4.9190838E7"\n"KRX \xb9\xe6\xbc\xdb\xc5\xeb\xbd\xc5","735.56","0.97","0.13","735.89","739.05","734.20","3941.0","62912.0","2.7815278E7"\n"KRX \xb0\xc7\xbc\xb3","632.21","1.30","0.21","634.02","636.04","627.60","15037.0","340061.0","4.9402107E7"\n"KRX \xc1\xf5\xb1\xc7","587.22","3.81","0.65","584.78","588.07","581.33","4624.0","37030.0","2.2306987E7"\n"KRX \xb1\xe2\xb0\xe8\xc0\xe5\xba\xf1","618.09","0.91","0.15","618.99","621.59","613.28","48440.0","833392.0","2.17248906E8"\n"KRX \xba\xb8\xc7\xe8","1276.43","-20.92","-1.61","1298.73","1302.07","1271.85","4507.0","90146.0","3.7796138E7"\n"KRX \xbf\xee\xbc\xdb","983.94","12.45","1.28","972.93","984.07","964.12","11877.0","134598.0","4.0566327E7"\n"KRX \xb0\xe6\xb1\xe2\xbc\xd2\xba\xf1\xc0\xe7","1087.38","4.28","0.40","1082.66","1089.81","1080.20","10328.0","305367.0","6.9123474E7"\n"KRX \xc7\xca\xbc\xf6\xbc\xd2\xba\xf1\xc0\xe7","1304.91","23.37","1.82","1288.80","1305.64","1283.46","22622.0","692745.0","7.4782709E7"\n"KRX \xb9\xcc\xb5\xf0\xbe\xee&\xbf\xa3\xc5\xcd\xc5\xd7\xc0\xce\xb8\xd5\xc6\xae","2001.64","33.96","1.73","1979.48","2001.64","1962.11","12260.0","810897.0","1.18398903E8"\n"KRX \xc1\xa4\xba\xb8\xb1\xe2\xbc\xfa","1449.75","-2.64","-0.18","1452.03","1453.67","1434.59","39153.0","2430488.0","5.90694036E8"\n"KRX \xc0\xaf\xc6\xbf\xb8\xae\xc6\xbc","970.41","4.80","0.50","966.46","977.22","961.97","2615.0","61781.0","2.2020845E7"\n"KRX 300 \xc1\xa4\xba\xb8\xb1\xe2\xbc\xfa","2341.02","-8.53","-0.36","2348.37","2351.69","2317.93","33179.0","2312319.0","5.82753225E8"\n"KRX 300 \xb1\xdd\xc0\xb6","743.34","-1.42","-0.19","747.73","749.29","739.43","22172.0","497555.0","1.52122632E8"\n"KRX 300 \xc0\xda\xc0\xaf\xbc\xd2\xba\xf1\xc0\xe7","1355.01","-5.08","-0.37","1361.93","1363.28","1347.68","13563.0","675036.0","1.64295306E8"\n"KRX 300 \xbb\xea\xbe\xf7\xc0\xe7","622.35","2.80","0.45","621.82","624.26","619.05","38961.0","1012363.0","2.10437324E8"\n"KRX 300 \xc7\xef\xbd\xba\xc4\xc9\xbe\xee","2533.76","73.28","2.98","2472.39","2533.76","2470.46","43355.0","1156176.0","1.6620209E8"\n"KRX 300 \xc4\xbf\xb9\xc2\xb4\xcf\xc4\xc9\xc0\xcc\xbc\xc7\xbc\xad\xba\xf1\xbd\xba","1553.64","18.56","1.21","1542.92","1553.64","1529.51","12930.0","829934.0","1.41175436E8"\n"KRX 300 \xbc\xd2\xc0\xe7","1640.78","-2.55","-0.16","1653.65","1658.93","1629.53","10818.0","1158617.0","1.78192402E8"\n"KRX 300 \xc7\xca\xbc\xf6\xbc\xd2\xba\xf1\xc0\xe7","1283.40","20.09","1.59","1271.04","1284.13","1265.35","4861.0","343935.0","6.8816533E7"'

위와 같이 성공적으로 데이터를 출력하는 것을 확인했다. 그런데 raw 데이터와 비교해보니, 맨 위 column을 설명하는 글은 16진수를 표현하는 \x 를 사용한 반면, "KRX 300" 등의 string 은 raw data 그대로 출력된 것을 확인했다.

아마 인코딩 방법에 따라 영어와 숫자는 raw 데이터 그대로 사용, 한국어의 경우 16진수로 표현을 한 것 같은데, 인코딩 방법에 대해서는 다른 포스팅에서 다룰 예정이다.

conclusion

금융데이터 분석 중 http response가 byte object 형태로 전달된 것을 확인하고 이를 알아보고 분석해보았다. ASCII, 그리고 utf-8 인코딩 등 추가적인 궁금증이 생겼고 이를 다른 포스팅에서 다뤄볼 예정이다.

profile
steadycode

0개의 댓글