๐Ÿ–ฑ๏ธ[Crawling] ์ฃผ์‹์ •๋ณด ํฌ๋กค๋ง ํ›„ ์—‘์…€ ์ €์žฅํ•˜๊ธฐ

๊ถŒ๊ทœ๋ฆฌยท2023๋…„ 5์›” 25์ผ
0

Crawling

๋ชฉ๋ก ๋ณด๊ธฐ
5/7
post-thumbnail

๐Ÿ“ข ์—ฌ๋Š” ๋ง

ํฌ๋กค๋ง์— ๋Œ€ํ•ด ๊ณต๋ถ€ํ•˜๋˜ ์ค‘ ๋‚ด๊ฐ€ ํฌ๋กค๋งํ•œ ๊ฒƒ์„ ์—‘์…€์— ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋˜์—ˆ๋‹ค. ๊ทธ๋ž˜์„œ ๊ฐ„๋‹จํ•œ ์ฃผ์‹ ํ˜„์žฌ๊ฐ€๋ฅผ ์—‘์…€์— ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ฒƒ์„ ํ•ด๋ณด์•˜๋‹ค.


01. openpyxl์ด๋ž€?

  • Python์—์„œ ์—‘์…€์„ ์‰ฝ๊ฒŒ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ฃผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
  • ์„ค์น˜ : pip install openpyxl
  • ๋ฐ˜๋“œ์‹œ ํ™•์žฅ์ž๊ฐ€ .xlsx ์—ฌ์•ผํ•œ๋‹ค.

02. ์‚ฌ์šฉํ•œ ์ฝ”๋“œ ๋ฐ ๋ฌธ๋ฒ•

1. ์—‘์…€ ํŒŒ์ผ ์ƒ์„ฑ ๋ฐฉ๋ฒ•

๐Ÿ’พ workbook ๊ฐ์ฒด๋กœ ๋ฐ˜ํ™˜ํ•  ๋•Œ ์—‘์…€ ํŒŒ์ผ์„ ์ƒˆ๋กœ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•๊ณผ ๊ธฐ์กด์— ์žˆ๋˜ ์—‘์…€ ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ• ๋‘ ๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค.

  • ์—‘์…€ ํŒŒ์ผ์„ ์ƒˆ๋กœ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•

    wb = openpyxl.Workbook() #์ƒˆ๋กœ์šด Workbook ๊ฐ์ฒด ์ƒ์„ฑ
    
    #Workbook ๊ฐ์ฒด๋ฅผ ์‹ค์ œ ํŒŒ์ผ์— ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด .save(ํŒŒ์ผ๋ช…) ์ž‘์„ฑ
    wb.save("ํŒŒ์ผ๋ช…") 
    #์ €์žฅ ํ•  ์œ„์น˜๋ฅผ ์ •ํ•ด์ฃผ๊ณ  ์‹ถ๋‹ค๋ฉด ๊ฒฝ๋กœ ์„ค์ •
    wb.save(r'C:\๊ฒฝ๋กœ')
  • ๊ธฐ์กด์— ์žˆ๋˜ ์—‘์…€ ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ•

    #path ๋ณ€์ˆ˜์— ๊ฒฝ๋กœ ์ €์žฅ
    path= r'C:\๊ฒฝ๋กœ'
    #๋ฏธ๋ฆฌ ๋งŒ๋“ค์–ด๋†“์€ ์—‘์…€ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ด
    wb= openpyxl.load_workbook(path)

path= r'C:\pythonStart\์ฃผ์‹\data.xlsx'
wb= openpyxl.load_workbook(path)

๐Ÿ“ ๋‚˜๋Š” ๊ธฐ์กด์— ๋ฏธ๋ฆฌ ๋งŒ๋“ค์–ด๋‘” data.xlsx ์—‘์…€ ํŒŒ์ผ์˜ ๊ฒฝ๋กœ๋ฅผ path์— ๋‹ด์•˜๋‹ค. ์œ„ ์ฝ”๋“œ์—์„œ wb= openpyxl.load_workbook(path) ๊ฒฝ๋กœ๋ฅผ ์ธ์ˆ˜๋กœ ํ•˜์—ฌ workbook ๊ฐ์ฒด๋กœ ๋ฐ˜ํ™˜ํ•œ ๊ฒƒ์ด๋‹ค.


2. WorkSheet ์„ค์ • ๋ฐฉ๋ฒ•

๐Ÿ’พ WorkSheet๋„ ์ƒˆ๋กœ์šด sheet ์ƒ์„ฑ ํ• ์ง€ , ๊ธฐ์กด์— ์žˆ๋˜ sheet ๋ถˆ๋Ÿฌ์˜ฌ ๊ฒƒ์ธ์ง€ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค.

  • ์ƒˆ๋กœ์šด sheet ์ƒ์„ฑ

    #์ƒˆ๋กœ์šด ์›Œํฌ๋ถ ๊ฐ์ฒด ์ƒ์„ฑ
    wb = op.Workbook() 
    #wb ๊ฐ์ฒด๋ฅผ ํ†ตํ•ด ์ƒˆ๋กœ์šด ์‹œํŠธ ์ƒ์„ฑ
    ws = wb.create_sheet("์‹œํŠธ๋ช…") 
  • ๊ธฐ์กด์— ์žˆ๋˜ sheet ๋ถˆ๋Ÿฌ์˜ด

    #ํ™œ์„ฑํ™”๋˜์–ด์žˆ๋Š” ์‹œํŠธ ์„ค์ •
    ws = wb.active 

3. ์ข…๋ชฉ ์ฝ”๋“œ ๋ฆฌ์ŠคํŠธ

codes = [
    '035720', #์นด์นด์˜ค ์ฝ”๋“œ
    '000660', #SKํ•˜์ด๋‹‰์Šค ์ฝ”๋“œ
    '005930', #์‚ผ์„ฑ์ „์ž ์ฝ”๋“œ
]

์ด๋Ÿฐ์‹์œผ๋กœ ๊ธฐ์—…์˜ ์ด๋ฆ„ ์˜†์—๋Š” ์ฝ”๋“œ๊ฐ€ ๋ถ€์—ฌ๋˜์–ด ์žˆ๋‹ค.

https://finance.naver.com/item/main.naver?code=035720
https://finance.naver.com/item/main.naver?code=000660

๋˜ํ•œ ํ•ด๋‹น ์ฃผ์‹์˜ URL์„ ์‚ดํŽด๋ด๋„ ํŒŒ๋ผ๋ฏธํ„ฐ ๋ถ€๋ถ„์— code = ๊ฐ’ ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ฝ”๋“œ๋ฅผ ๋‹ด์•„์ฃผ๊ณ  ๋ฐ˜๋ณต๋ฌธ์„ ์‹คํ–‰ํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ๋‹ค.


4. ๋ฐ˜๋ณต๋ฌธ

row = 2
for i in codes:
    url= f"https://finance.naver.com/item/sise.naver?code={i}"
    response= requests.get(url)
    html= response.text
    soup= BeautifulSoup(html, 'html.parser')
    price= soup.select_one("#_nowVal").text
    price = price.replace(',', '')
    #๋ฌธ์ž์—ด์„ ,๋ฅผ ์ œ๊ฑฐ
    print(price)
    ws[f'B{row}'] = int(price)
    row = row+1

wb.save(path)

row = 2๋Š” 2๋ฒˆ์งธ ํ–‰์„ ์ดˆ๊ธฐ๊ฐ’์œผ๋กœ ์„ค์ •ํ•˜์—ฌ, for๋ฌธ์ด ๋Œ๋ฉด์„œ 3, 4, .. ๋ฒˆ์งธ ํ–‰์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์˜๋ฏธ๋กœ, for๋ฌธ์ด ๋Œ๊ธฐ ์ „์— ์„ ์–ธํ–ˆ๋‹ค.

url= f"https://finance.naver.com/item/sise.naver?code={i}" code์˜ ๊ฐ’ ๋ถ€๋ถ„์— ๋ณ€์ˆ˜๋ฅผ ์„ค์ •ํ•˜์—ฌ ์œ„์— code์— ๋‹ด์•„์ค€ ์ฝ”๋“œ๊ฐ’๋“ค์ด i์— ๋“ค์–ด๊ฐ€๊ฒŒ ํ–ˆ๋‹ค.

price= soup.select_one("#_nowVal").text์˜ #_nowVal์€ ๊ธฐ์—…์˜ ์ฃผ์‹ ํ˜„์žฌ๊ฐ€๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์ด๋‹ค. ํ˜„์žฌ๊ฐ€๋Š” ๋ฌธ์ž์—ด๋กœ ๋ฐ›๊ณ  ์žˆ์œผ๋ฉฐ ์ค‘๊ฐ„์— ' , '๊ฐ€ ๋“ค์–ด๊ฐ€์„œ replace๋กœ ์ œ๊ฑฐํ•ด์ฃผ์—ˆ๋‹ค.

ws[f'B{row}'] = int(price)๋Š” ๊ธฐ์—…์˜ ํ˜„์žฌ๊ฐ€๋ฅผ ์ˆซ์ž๋กœ ํ˜•๋ณ€ํ™˜ ํ•ด์ฃผ์—ˆ๊ณ , ์ด๋ฅผ B2ํ–‰์— ๋‹ด๋Š”๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค.


03. ๊ฒฐ๊ณผ

์ •๋ฆฌํ•˜์ž๋ฉด, ์ฃผ์‹์˜ ํŽ˜์ด์ง€๋ฅผ ์„ค์ •ํ•ด๋†“์€ Code์˜ ๊ฐ’์œผ๋กœ ์ด๋™ํ•˜์—ฌ ํ•ด๋‹น ํŽ˜์ด์ง€์—์„œ #_nowVal์— ํ•ด๋‹นํ•˜๋Š” ์ •๋ณด๋ฅผ ํฌ๋กค๋งํ•˜์—ฌ ์ด๋ฅผ ์—‘์…€ ํŒŒ์ผ์— ๋‹ด๋Š” ๊ฒƒ์ด๋‹ค.

profile
๊ทค๊ทค ์ฝ”๋”ฉ

0๊ฐœ์˜ ๋Œ“๊ธ€