주식 데이터수집 2 - Yahoo Query

강혜성·2023년 3월 21일
0

분산처리

목록 보기
10/18

Yahoo Query 데이터 가져오기

  • valid_company_list.csv를 이용해서 데이터를 가져옴

  • valid_company_list.csv는 yahoo query 조회시 얻어올 수 있는 데이터를 저장한 파일

  • import

import pandas as pd
import numpy as np
import yfinance as yf
import datetime
import yahooquery as yq
from yahooquery import Ticker
from datetime import datetime, timedelta

symbol_id = pd.read_csv("valid_company_list.csv", encoding='euc-kr')
symbol_id

valid_company_list.csv 파일 형식


1. asset_profile

  • Information related to the company's location, operations, and officers.

  • asset_profile이란 dict를 생성한 후 해당 dict에 값을 넣는다.

  • 회사에 따라서 존재하지 않는 값들이 있으므로 체크한 후 넣어준다.

  • 회사에 대한 정보가 없을 경우 String 형태로 값이 들어온다.

  • 특정 값만 없을 경우 해당 key가 빠져있다.

https://yahooquery.dpguthrie.com/guide/ticker/modules/#asset_profile

asset_profile = {
   "country" : list(),
   "industry" : list(),
   "sector" : list(),
   "phone" : list(),
   "website" : list(),
}

non_data_symbols = list()

count = 0;
for idx, row in symbol_id.iterrows():
   ticker = yq.Ticker(row["symbol"], backoff_factor=1)
   
   print(row["symbol"])

   if(type(ticker.asset_profile[row["symbol"]]) is not str):

       if "country" in ticker.asset_profile[row["symbol"]].keys():
           asset_profile["country"].append(ticker.asset_profile[row["symbol"]]["country"])
       else: asset_profile["country"].append("None")

       if "industry" in ticker.asset_profile[row["symbol"]].keys():
           asset_profile["industry"].append(ticker.asset_profile[row["symbol"]]["industry"])
       else: asset_profile["industry"].append("None")

       if "sector" in ticker.asset_profile[row["symbol"]].keys():
           asset_profile["sector"].append(ticker.asset_profile[row["symbol"]]["sector"])
       else: asset_profile["sector"].append("None")

       if "phone" in ticker.asset_profile[row["symbol"]].keys():
           asset_profile["phone"].append(ticker.asset_profile[row["symbol"]]["phone"])
       else: asset_profile["phone"].append("None")

       if "website" in ticker.asset_profile[row["symbol"]].keys():
           asset_profile["website"].append(ticker.asset_profile[row["symbol"]]["website"])
       else: asset_profile["website"].append("None")
   
   else: non_data_symbols.append(row["symbol"])
   
   count += 1
       
  • 회사 정보가 없는 종목 출력
non_data_symbols

# 출력결과
['094800.KS', '446070.KS', '109070.KS', '168490.KS']
  • DataFrame으로 변환
print(len(asset_profile["country"]), len(asset_profile["industry"]), len(asset_profile["sector"]), len(asset_profile["phone"]), len(asset_profile["website"]))
df_symbol = pd.DataFrame.from_dict(asset_profile)
df_symbol
# 출력결과
country	industry	sector	phone	website
0	South Korea	Rental & Leasing Services	Industrials	82 2 6363 9999	https://www.ajurental.com
1	South Korea	Chemicals	Basic Materials	82 2 768 2923	https://www.aekyunggroup.co.kr
2	South Korea	Grocery Stores	Consumer Defensive	82 1 577 8007	https://www.bgfretail.com
3	South Korea	Department Stores	Consumer Cyclical	82 1 577 3663	https://www.bgf.co.kr
4	South Korea	Banks—Regional	Financial Services	82 5 1620 3000	https://www.bnkfg.com
...	...	...	...	...	...
1146	United States	Utilities—Regulated Electric	Utilities	612 330 5500	https://www.xcelenergy.com
1147	United States	Software—Application	Technology	385 203 4999	https://www.qualtrics.com
1148	United States	Communication Equipment	Technology	847 634 6700	https://www.zebra.com
1149	United States	Software—Application	Technology	888 799 9666	https://www.zoom.us
1150	United States	Software—Infrastructure	Technology	408 533 0288	https://www.zscaler.com
  • csv 파일로 저장

2. grading_history

  • Data related to upgrades / downgrades by companies for a given symbol(s)

  • 한국 데이터는 없음

  • 반환 결과가 데이터프레임, 합쳐서 csv 파일로 반환

https://yahooquery.dpguthrie.com/guide/ticker/modules/#grading_history

grading_history= pd.concat([grading_history, ticker.grading_history])

3. index_trend

  • Trend data related given symbol(s) index, specificially PE and PEG ratios

  • pe, peg를 구하기 위해 사용하지만, get_financial_data에서 가져올 수 있으므로 생략


4. get_financial_data

  • Obtain specific data from either cash flow, income statement, balance sheet, or valuation measures.

  • get_financial_data 사용 시 오류

  • valuation_mesuar, balance_sheet, income_statement, cash_flow로 따로 가져올 경우 데이터가 없는 경우가 있음

  • all_financial_data를 사용해서 모든 데이터 가져온 후 분류

if(type(ticker.all_financial_data()) is pd.DataFrame):
        financial_data = pd.concat([financial_data, ticker.all_financial_data()])

https://yahooquery.dpguthrie.com/guide/ticker/modules/#index_trend


5. Historical Price

  • Retreives historical pricing data (OHLC) for given symbol(s)

6. corporate_events

  • Significant events related to a given symbol(s)

  • 한국 데이터 없음

  • 데이터 프레임 형태로 데이터 제공

corporate_events = pd.concat([corporate_events, ticker.corporate_events])

7. news

  • Get news headline and summary information for given symbol(s)
  • 현재 (2023-03-21)기준 Example 수행 시 ['error'] 리턴
aapl = Ticker('aapl')
aapl.news(5)

8. recommendations

  • Get real-time quote information for given symbol(s)
  • 한국 코드도 가능
  • 관련 유사 종목으로 사용할 예정
recommendations = {
    "symbol_id" : list(),
    "simillar_symbol_id" : list(),
    "score" : list()
} 
recommendation_list = list()
for recommend in tickers.recommendations["001360.KS"]["recommendedSymbols"]:
    recommendations["symbol_id"].append("001360.KS")
    recommendations["simillar_symbol_id"].append(recommend["symbol"])
    recommendations["score"].append(recommend["score"])
recommendations

Ticker

  • 위에서 가져오는 정보들은 대부분 Ticker로 가져옴.

  • 중복 호출 방지를 위해서 한번에 처리하는 코드로 변환

  • 에러 코드 발생 시 String 형태로 오므로 str인지 비교하는 구문을 생성함.

  • count 변수는 의미 없음 (중간 취소 및 확인을 위해 사용)

  • Ticker Data 수집

asset_profile = {
    "symbol_id" : list(),
    "country" : list(),
    "industry" : list(),
    "sector" : list(),
    "phone" : list(),
    "website" : list(),
}

long_business_summary = {
    "symbol_id" : list(),
    "summary" : list(),
}

recommendations = {
    "symbol_id" : list(),
    "simillar_symbol_id" : list(),
    "score" : list()
} 

grading_history = pd.DataFrame()
corporate_events = pd.DataFrame()
financial_data = pd.DataFrame()

non_data_symbols = list()

count = 0;
for idx, row in symbol_id.iterrows():
    ticker = yq.Ticker(row["symbol"], backoff_factor=1)
    
    

    print(row["symbol"])

    #### Ticker - Asset Profile
    if(type(ticker.asset_profile[row["symbol"]]) is not str):
    
    	asset_profile["symbol_id"].append(row["symbol"])

        if "country" in ticker.asset_profile[row["symbol"]].keys():
            asset_profile["country"].append(ticker.asset_profile[row["symbol"]]["country"])
        else: asset_profile["country"].append("None")

        if "industry" in ticker.asset_profile[row["symbol"]].keys():
            asset_profile["industry"].append(ticker.asset_profile[row["symbol"]]["industry"])
        else: asset_profile["industry"].append("None")

        if "sector" in ticker.asset_profile[row["symbol"]].keys():
            asset_profile["sector"].append(ticker.asset_profile[row["symbol"]]["sector"])
        else: asset_profile["sector"].append("None")

        if "phone" in ticker.asset_profile[row["symbol"]].keys():
            asset_profile["phone"].append(ticker.asset_profile[row["symbol"]]["phone"])
        else: asset_profile["phone"].append("None")

        if "website" in ticker.asset_profile[row["symbol"]].keys():
            asset_profile["website"].append(ticker.asset_profile[row["symbol"]]["website"])
        else: asset_profile["website"].append("None")

        #### Ticker - Long Business Summary
        if "longBusinessSummary" in ticker.asset_profile[row["symbol"]].keys():
            long_business_summary["symbol_id"].append(ticker.asset_profile[row["symbol"]])
            long_business_summary["summary"].append(ticker.asset_profile[row["symbol"]]["longBusinessSummary"])

    else: non_data_symbols.append(row["symbol"])

    #### Ticker - Grading History
    grading_history = pd.concat([grading_history, ticker.grading_history])

    
    ### Ticker - Corporate Events
    if type(ticker.corporate_events) is not str:
        corporate_events = pd.concat([corporate_events, ticker.corporate_events])

    ### Ticker - recommendations
    if type(ticker.recommendations[row["symbol"]]) is not str:
        for recommend in ticker.recommendations[row["symbol"]]["recommendedSymbols"]:
            recommendations["symbol_id"].append(row["symbol"])
            recommendations["simillar_symbol_id"].append(recommend["symbol"])
            recommendations["score"].append(recommend["score"])
    
    # Ticker - Financial Data
    if(type(ticker.all_financial_data()) is pd.DataFrame):
        financial_data = pd.concat([financial_data, ticker.all_financial_data()])
    count += 1
        
  • Dict to DataFrame
df_asset_profile = pd.DataFrame.from_dict(asset_profile)
df_long_business_summary = pd.DataFrame.from_dict(long_business_summary)
df_recommendations = pd.DataFrame.from_dict(recommendations)
  • DataFrame to csv File
df_asset_profile.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/asset_profile.csv")
df_long_business_summary.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/long_business_summary.csv")
df_recommendations.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/recommendations.csv")
grading_history.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/grading_history.csv")
corporate_events.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/corporate.csv")
financial_data.to_csv("C:/Users/SSAFY/Desktop/Project2/API_SCRIPT/financial_data.csv")

0개의 댓글