주식 데이터수집 2 - Yahoo Query

강혜성·2023년 3월 21일

분산처리

목록 보기

10/18

Yahoo Query 데이터 가져오기

valid_company_list.csv를 이용해서 데이터를 가져옴
valid_company_list.csv는 yahoo query 조회시 얻어올 수 있는 데이터를 저장한 파일
import

import pandas as pd
import numpy as np
import yfinance as yf
import datetime
import yahooquery as yq
from yahooquery import Ticker
from datetime import datetime, timedelta

symbol_id = pd.read_csv("valid_company_list.csv", encoding='euc-kr')
symbol_id

valid_company_list.csv 파일 형식

1. asset_profile

Information related to the company's location, operations, and officers.
asset_profile이란 dict를 생성한 후 해당 dict에 값을 넣는다.
회사에 따라서 존재하지 않는 값들이 있으므로 체크한 후 넣어준다.
회사에 대한 정보가 없을 경우 String 형태로 값이 들어온다.
특정 값만 없을 경우 해당 key가 빠져있다.

https://yahooquery.dpguthrie.com/guide/ticker/modules/#asset_profile

asset_profile = {
   "country" : list(),
   "industry" : list(),
   "sector" : list(),
   "phone" : list(),
   "website" : list(),
}

non_data_symbols = list()

count = 0;
for idx, row in symbol_id.iterrows():
   ticker = yq.Ticker(row["symbol"], backoff_factor=1)
   
   print(row["symbol"])

   if(type(ticker.asset_profile[row["symbol"]]) is not str):

       if "country" in ticker.asset_profile[row["symbol"]].keys():
           asset_profile["country"].append(ticker.asset_profile[row["symbol"]]["country"])
       else: asset_profile["country"].append("None")

       if "industry" in ticker.asset_profile[row["symbol"]].keys():
           asset_profile["industry"].append(ticker.asset_profile[row["symbol"]]["industry"])
       else: asset_profile["industry"].append("None")

       if "sector" in ticker.asset_profile[row["symbol"]].keys():
           asset_profile["sector"].append(ticker.asset_profile[row["symbol"]]["sector"])
       else: asset_profile["sector"].append("None")

       if "phone" in ticker.asset_profile[row["symbol"]].keys():
           asset_profile["phone"].append(ticker.asset_profile[row["symbol"]]["phone"])
       else: asset_profile["phone"].append("None")

       if "website" in ticker.asset_profile[row["symbol"]].keys():
           asset_profile["website"].append(ticker.asset_profile[row["symbol"]]["website"])
       else: asset_profile["website"].append("None")
   
   else: non_data_symbols.append(row["symbol"])
   
   count += 1

회사 정보가 없는 종목 출력

non_data_symbols

# 출력결과
['094800.KS', '446070.KS', '109070.KS', '168490.KS']

DataFrame으로 변환

print(len(asset_profile["country"]), len(asset_profile["industry"]), len(asset_profile["sector"]), len(asset_profile["phone"]), len(asset_profile["website"]))
df_symbol = pd.DataFrame.from_dict(asset_profile)
df_symbol

# 출력결과
country	industry	sector	phone	website
0	South Korea	Rental & Leasing Services	Industrials	82 2 6363 9999	https://www.ajurental.com
1	South Korea	Chemicals	Basic Materials	82 2 768 2923	https://www.aekyunggroup.co.kr
2	South Korea	Grocery Stores	Consumer Defensive	82 1 577 8007	https://www.bgfretail.com
3	South Korea	Department Stores	Consumer Cyclical	82 1 577 3663	https://www.bgf.co.kr
4	South Korea	Banks—Regional	Financial Services	82 5 1620 3000	https://www.bnkfg.com
...	...	...	...	...	...
1146	United States	Utilities—Regulated Electric	Utilities	612 330 5500	https://www.xcelenergy.com
1147	United States	Software—Application	Technology	385 203 4999	https://www.qualtrics.com
1148	United States	Communication Equipment	Technology	847 634 6700	https://www.zebra.com
1149	United States	Software—Application	Technology	888 799 9666	https://www.zoom.us
1150	United States	Software—Infrastructure	Technology	408 533 0288	https://www.zscaler.com

csv 파일로 저장

2. grading_history

Data related to upgrades / downgrades by companies for a given symbol(s)
한국 데이터는 없음
반환 결과가 데이터프레임, 합쳐서 csv 파일로 반환

https://yahooquery.dpguthrie.com/guide/ticker/modules/#grading_history

grading_history= pd.concat([grading_history, ticker.grading_history])

3. index_trend

Trend data related given symbol(s) index, specificially PE and PEG ratios
pe, peg를 구하기 위해 사용하지만, get_financial_data에서 가져올 수 있으므로 생략

4. get_financial_data

Obtain specific data from either cash flow, income statement, balance sheet, or valuation measures.
get_financial_data 사용 시 오류
valuation_mesuar, balance_sheet, income_statement, cash_flow로 따로 가져올 경우 데이터가 없는 경우가 있음
all_financial_data를 사용해서 모든 데이터 가져온 후 분류

if(type(ticker.all_financial_data()) is pd.DataFrame):
        financial_data = pd.concat([financial_data, ticker.all_financial_data()])

https://yahooquery.dpguthrie.com/guide/ticker/modules/#index_trend

5. Historical Price

Retreives historical pricing data (OHLC) for given symbol(s)

6. corporate_events

Significant events related to a given symbol(s)
한국 데이터 없음
데이터 프레임 형태로 데이터 제공

corporate_events = pd.concat([corporate_events, ticker.corporate_events])

7. news

Get news headline and summary information for given symbol(s)
현재 (2023-03-21)기준 Example 수행 시 ['error'] 리턴

aapl = Ticker('aapl')
aapl.news(5)

8. recommendations

Get real-time quote information for given symbol(s)

한국 코드도 가능
관련 유사 종목으로 사용할 예정

recommendations = {
    "symbol_id" : list(),
    "simillar_symbol_id" : list(),
    "score" : list()
} 
recommendation_list = list()
for recommend in tickers.recommendations["001360.KS"]["recommendedSymbols"]:
    recommendations["symbol_id"].append("001360.KS")
    recommendations["simillar_symbol_id"].append(recommend["symbol"])
    recommendations["score"].append(recommend["score"])
recommendations

Ticker

위에서 가져오는 정보들은 대부분 Ticker로 가져옴.
중복 호출 방지를 위해서 한번에 처리하는 코드로 변환
에러 코드 발생 시 String 형태로 오므로 str인지 비교하는 구문을 생성함.
count 변수는 의미 없음 (중간 취소 및 확인을 위해 사용)
Ticker Data 수집

asset_profile = {
    "symbol_id" : list(),
    "country" : list(),
    "industry" : list(),
    "sector" : list(),
    "phone" : list(),
    "website" : list(),
}

long_business_summary = {
    "symbol_id" : list(),
    "summary" : list(),
}

recommendations = {
    "symbol_id" : list(),
    "simillar_symbol_id" : list(),
    "score" : list()
} 

grading_history = pd.DataFrame()
corporate_events = pd.DataFrame()
financial_data = pd.DataFrame()

non_data_symbols = list()

count = 0;
for idx, row in symbol_id.iterrows():
    ticker = yq.Ticker(row["symbol"], backoff_factor=1)
    
    

    print(row["symbol"])

    #### Ticker - Asset Profile
    if(type(ticker.asset_profile[row["symbol"]]) is not str):
    
    	asset_profile["symbol_id"].append(row["symbol"])

        if "country" in ticker.asset_profile[row["symbol"]].keys():
            asset_profile["country"].append(ticker.asset_profile[row["symbol"]]["country"])
        else: asset_profile["country"].append("None")

        if "industry" in ticker.asset_profile[row["symbol"]].keys():
            asset_profile["industry"].append(ticker.asset_profile[row["symbol"]]["industry"])
        else: asset_profile["industry"].append("None")

        if "sector" in ticker.asset_profile[row["symbol"]].keys():
            asset_profile["sector"].append(ticker.asset_profile[row["symbol"]]["sector"])
        else: asset_profile["sector"].append("None")

        if "phone" in ticker.asset_profile[row["symbol"]].keys():
            asset_profile["phone"].append(ticker.asset_profile[row["symbol"]]["phone"])
        else: asset_profile["phone"].append("None")

        if "website" in ticker.asset_profile[row["symbol"]].keys():
            asset_profile["website"].append(ticker.asset_profile[row["symbol"]]["website"])
        else: asset_profile["website"].append("None")

        #### Ticker - Long Business Summary
        if "longBusinessSummary" in ticker.asset_profile[row["symbol"]].keys():
            long_business_summary["symbol_id"].append(ticker.asset_profile[row["symbol"]])
            long_business_summary["summary"].append(ticker.asset_profile[row["symbol"]]["longBusinessSummary"])

    else: non_data_symbols.append(row["symbol"])

    #### Ticker - Grading History
    grading_history = pd.concat([grading_history, ticker.grading_history])

    
    ### Ticker - Corporate Events
    if type(ticker.corporate_events) is not str:
        corporate_events = pd.concat([corporate_events, ticker.corporate_events])

    ### Ticker - recommendations
    if type(ticker.recommendations[row["symbol"]]) is not str:
        for recommend in ticker.recommendations[row["symbol"]]["recommendedSymbols"]:
            recommendations["symbol_id"].append(row["symbol"])
            recommendations["simillar_symbol_id"].append(recommend["symbol"])
            recommendations["score"].append(recommend["score"])
    
    # Ticker - Financial Data
    if(type(ticker.all_financial_data()) is pd.DataFrame):
        financial_data = pd.concat([financial_data, ticker.all_financial_data()])
    count += 1

Dict to DataFrame

df_asset_profile = pd.DataFrame.from_dict(asset_profile)
df_long_business_summary = pd.DataFrame.from_dict(long_business_summary)
df_recommendations = pd.DataFrame.from_dict(recommendations)

DataFrame to csv File

df_asset_profile.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/asset_profile.csv")
df_long_business_summary.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/long_business_summary.csv")
df_recommendations.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/recommendations.csv")
grading_history.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/grading_history.csv")
corporate_events.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/corporate.csv")
financial_data.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/financial_data.csv")