[공모전 수상작 리뷰] Reactjs+Nodejs+python+scikit-learn{ PCA(주성분 분석), VAR(다변량시계열분석)}으로 공연 예매 추이 시나리오 별 예측하는 서비스 만들어보기 - 데이터 분석 편(2)

Design.C·2022년 1월 6일
1

데이터 분석을 하며 공부한 점

  • 다양한 소스에서 데이터 수집
  • 수집한 데이터를 목적에 맞게 전처리
  • 데이터 모델링 및 모델 간 교차검증
  • 다변량 시계열 분석 최종 모델 개발

데이터 모델링 및 모델 간 교차검증과정

#현재 가장 성능이 좋은 m9번 모델을 수행한 주피터노트북만이 실행창에 남아있음
#필요 라이브러리 로드
import numpy as np
import pandas as pd
import seaborn as sns
from statsmodels.stats.outliers_influence import variance_inflation_factor
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
import matplotlib
matplotlib.font_manager._rebuild()
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler,Normalizer
from sklearn.preprocessing import OneHotEncoder, LabelEncoder

from sklearn.model_selection import train_test_split

from sklearn.metrics import mean_squared_error, r2_score

from sklearn.decomposition import PCA

from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import adfuller

sns.set(style='whitegrid')
pd.set_option('display.max_rows',500)
font_path = r'경로\\NanumFontSetup_TTF_GOTHIC.NanumGothic.ttf'
fontprop = fm.FontProperties(fname=font_path, size=18)

데이터 로드 및 기본적인 전처리 작업

기본적인 피처 설명

기간: 2019.01.01 ~ 2021.08.31

  • ott_user_count: OTT앱 일 별 사용자 수,

  • ott_usage_time: OTT앱 일 별 사용시간,

  • delivery_user_count: 배달앱 일 별 사용자 수,

  • delivery_usage_time: 배달앱 일 별 사용시간,

  • used_user_count: 중고거래앱 일 별 사용자 수,

  • used_usage_time: 중고거래앱 일 별 사용시간,

  • meeting_user_count: 화상회의앱 일 별 사용자 수,

  • meeting_usage_time: 화상회의앱 일 별 사용시간,

  • corona_count: 일 별 코로나 확진자 수,

  • subway_count: 일 별 지하철 이용자 수,

  • KOSPI_index: 일 별 코스피 지수,

  • KOSPI_trading: 일 별 코스피 시장 거래량,

  • KOSDAQ_index: 일 별 코스닥 지수,

  • KOSDAQ_trading: 일 별 코스닥 시장 거래량,

  • coin_trading: 일 별 가상화폐(비트코인+이더리움)거래량 평균,

  • coin_variance: 전 일 대비 일 별 가상화폐(비트코인+이더리움)등락률 평균,

#앞의 과정에서 전처리가 완료된 데이터 로드
df = pd.read_csv("경로\\201901_202108_종합통계_시계열분석용.csv")
df.drop('Unnamed: 0', axis=1, inplace=True)
df['corona_count'].fillna(0,inplace=True)
df['coin_trading'] = df['bitcoin_trading']+df['ethereum_trading']
df['coin_variance'] = (df['bitcoin_variance']+df['ethereum_variance'])/2
df.drop(['bitcoin_trading','ethereum_trading',
        'bitcoin_variance','ethereum_variance'],axis=1,inplace=True)
df.index = df['date']
df_date = df['date']
df.drop(['date'],axis=1, inplace=True)
#로그스케일링 처리한 모델을 위해 가상화폐 데이터의 음수값을 전처리 함
# # 로그스케일링을 위해 coin_variance에 100을 더함(음수면 사용 불가)
# # 로그 스케일을 사용할때에만 사용
# df['coin_variance'] = df['coin_variance']+100
# for i in df.columns:
#     df[i] = np.log1p(df[i])
X = df.iloc[:,1:]
y = df.iloc[:,0]
#StandardScaler 객체 생성
scaler = StandardScaler()
#StandardScaler로 데이터 셋 변환, fit()과 transform()호출
scaler.fit(X)
X_scaled = scaler.transform(X)
X_scaled = pd.DataFrame(data=X_scaled, columns=X.columns)
X_scaled.index = df_date
X_scaled

# #MinMaxScaler 객체 생성
# scaler = MinMaxScaler()
# #MinMaxScaler 데이터 셋 변환, fit()과 transform()호출
# scaler.fit(X)
# X_scaled = scaler.transform(X)
# X_scaled = pd.DataFrame(data=X_scaled, columns=X.columns)
# X_scaled.index = df_date
# X_scaled

# #Robust 객체 생성
# scaler = RobustScaler()
# #RobustScaler 데이터 셋 변환, fit()과 transform()호출
# scaler.fit(X)
# X_scaled = scaler.transform(X)
# X_scaled = pd.DataFrame(data=X_scaled, columns=X.columns)
# X_scaled.index = df_date
# X_scaled

X=X_scaled
#타겟 변수(공연 예매 건수)와 피처 결합
df = pd.merge(y, X,left_index=True, right_index=True,how='inner')
df
ticketing_count ott_user_count ott_usage_time delivery_user_count delivery_usage_time used_user_count used_usage_time meeting_user_count meeting_usage_time corona_count subway_count KOSPI_index KOSPI_trading KOSDAQ_index KOSDAQ_trading coin_trading coin_variance
date
2019/01/01 7401 -1.301964 -1.126337 -1.138730 -0.879145 -1.332576 -1.373703 -1.053405 -0.843484 -0.607122 -1.546825 -0.857302 -1.152997 -0.801456 -1.333287 -0.760446 0.894595
2019/01/02 5069 -1.411287 -1.360638 -1.542213 -1.360824 -1.294828 -1.363849 -0.816423 -0.804730 -0.607122 0.758960 -0.857302 -1.152997 -0.801456 -1.333287 -0.473632 1.209898
2019/01/03 6498 -1.512255 -1.380048 -1.536694 -1.377820 -1.179579 -1.346215 -0.813926 -0.802878 -0.607122 0.908731 -0.892203 -0.903009 -0.886813 -1.155502 -0.653263 -0.830298
2019/01/04 7088 -1.343318 -1.318085 -1.434606 -1.294248 -1.309262 -1.362754 -0.819952 -0.804077 -0.607122 1.091537 -0.856767 -0.947702 -0.835184 -1.310455 -0.547525 0.442506
2019/01/05 18755 -1.010367 -0.963209 -1.174507 -0.999621 -1.319738 -1.330514 -1.046767 -0.836480 -0.607122 -0.153104 -0.856767 -0.947702 -0.835184 -1.310455 -0.557694 -0.100001
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021/08/27 19582 1.656060 1.539082 2.271937 2.151560 1.174091 0.778091 1.484776 1.743173 3.594865 1.533985 1.549169 -0.705340 1.646165 -0.178085 -1.055345 1.133390
2021/08/28 45456 1.780755 1.632663 2.442208 2.699594 1.484143 1.128198 -0.075720 0.000089 3.187087 -0.002058 1.549169 -0.705340 1.646165 -0.178085 -1.100312 -0.239105
2021/08/29 31871 1.692069 2.029458 2.325626 2.782625 1.308575 1.197393 0.060223 0.137999 2.877739 -0.921063 1.549169 -0.705340 1.646165 -0.178085 -1.079082 -0.199692
2021/08/30 3652 1.312763 1.266002 1.502044 1.133586 1.316112 0.847673 1.702709 1.957982 2.608230 1.558606 1.571202 -0.509248 1.703738 -0.361816 -1.041119 -0.505722
2021/08/31 8582 1.434144 1.282830 1.685858 1.606643 1.279019 0.879251 1.712669 1.997506 4.138569 1.394636 1.689138 -0.386843 1.748593 -0.262413 -0.977654 0.676665

974 rows × 17 columns

다중공선성 확인 (VIF)

# X = df.iloc[:,1:]
# y = df.iloc[:,0]

vif = [variance_inflation_factor(X.values, i)for i in range(X.shape[1])]

result = sm.OLS(y,X).fit()
print(result.summary())
                                 OLS Regression Results                                
=======================================================================================
Dep. Variable:        ticketing_count   R-squared (uncentered):                   0.268
Model:                            OLS   Adj. R-squared (uncentered):              0.256
Method:                 Least Squares   F-statistic:                              21.90
Date:                Thu, 09 Sep 2021   Prob (F-statistic):                    1.79e-54
Time:                        03:47:25   Log-Likelihood:                         -11046.
No. Observations:                 974   AIC:                                  2.212e+04
Df Residuals:                     958   BIC:                                  2.220e+04
Df Model:                          16                                                  
Covariance Type:            nonrobust                                                  
=======================================================================================
                          coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------------
ott_user_count       1.607e+04   5420.509      2.965      0.003    5437.047    2.67e+04
ott_usage_time      -7883.4350   5969.037     -1.321      0.187   -1.96e+04    3830.461
delivery_user_count  9444.5231   9008.048      1.048      0.295   -8233.260    2.71e+04
delivery_usage_time  9982.7230   7790.915      1.281      0.200   -5306.505    2.53e+04
used_user_count      1636.4530   5825.886      0.281      0.779   -9796.519    1.31e+04
used_usage_time     -1.474e+04   6177.052     -2.386      0.017   -2.69e+04   -2617.060
meeting_user_count  -2.108e+04   5093.297     -4.139      0.000   -3.11e+04   -1.11e+04
meeting_usage_time   1.768e+04   4625.185      3.822      0.000    8603.069    2.68e+04
corona_count        -9564.6525   1633.899     -5.854      0.000   -1.28e+04   -6358.219
subway_count         3435.3489   1604.084      2.142      0.032     287.426    6583.272
KOSPI_index          6993.8696   3293.135      2.124      0.034     531.279    1.35e+04
KOSPI_trading       -2268.0617   1209.873     -1.875      0.061   -4642.370     106.246
KOSDAQ_index        -1.295e+04   2879.781     -4.496      0.000   -1.86e+04   -7297.072
KOSDAQ_trading       1672.7817   1409.311      1.187      0.236   -1092.912    4438.476
coin_trading         -872.6250    881.917     -0.989      0.323   -2603.338     858.088
coin_variance        -173.5325    661.886     -0.262      0.793   -1472.446    1125.380
==============================================================================
Omnibus:                      474.550   Durbin-Watson:                   0.356
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             4993.538
Skew:                           1.969   Prob(JB):                         0.00
Kurtosis:                      13.370   Cond. No.                         58.3
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

컬럼별로 순차적으로 삭제해가며 다중공선성 확인

df.drop(['delivery_user_count'],
       axis=1, inplace=True)
df.drop(['coin_variance'],
       axis=1, inplace=True)
# ,'used_user_count'
df.drop(['ott_usage_time'],
       axis=1, inplace=True)
df.drop(['KOSPI_index'],
       axis=1, inplace=True)
df.drop(['KOSDAQ_trading'],
       axis=1, inplace=True)
df.drop(['KOSPI_trading'],
       axis=1, inplace=True)
df.drop(['meeting_user_count'],
       axis=1, inplace=True)
df.drop(['ott_user_count'],
       axis=1, inplace=True)
df.drop(['KOSDAQ_index'],
       axis=1, inplace=True)
df.drop(['meeting_usage_time'],
       axis=1, inplace=True)
df.drop(['delivery_usage_time'],
       axis=1, inplace=True)
# X = df.iloc[:,1:]
# y = df.iloc[:,0]
vif = pd.DataFrame()
vif['VIF Factor'] = [variance_inflation_factor(X.values, i)for i in range(X.shape[1])]
vif['features'] = X.columns
vif.round(1)
VIF Factor features
0 1.0 p1
1 1.0 p2
2 1.0 p3
df
ticketing_count ott_user_count ott_usage_time delivery_user_count delivery_usage_time used_user_count used_usage_time meeting_user_count meeting_usage_time corona_count subway_count KOSPI_index KOSPI_trading KOSDAQ_index KOSDAQ_trading coin_trading coin_variance
date
2019/01/01 7401 -1.301964 -1.126337 -1.138730 -0.879145 -1.332576 -1.373703 -1.053405 -0.843484 -0.607122 -1.546825 -0.857302 -1.152997 -0.801456 -1.333287 -0.760446 0.894595
2019/01/02 5069 -1.411287 -1.360638 -1.542213 -1.360824 -1.294828 -1.363849 -0.816423 -0.804730 -0.607122 0.758960 -0.857302 -1.152997 -0.801456 -1.333287 -0.473632 1.209898
2019/01/03 6498 -1.512255 -1.380048 -1.536694 -1.377820 -1.179579 -1.346215 -0.813926 -0.802878 -0.607122 0.908731 -0.892203 -0.903009 -0.886813 -1.155502 -0.653263 -0.830298
2019/01/04 7088 -1.343318 -1.318085 -1.434606 -1.294248 -1.309262 -1.362754 -0.819952 -0.804077 -0.607122 1.091537 -0.856767 -0.947702 -0.835184 -1.310455 -0.547525 0.442506
2019/01/05 18755 -1.010367 -0.963209 -1.174507 -0.999621 -1.319738 -1.330514 -1.046767 -0.836480 -0.607122 -0.153104 -0.856767 -0.947702 -0.835184 -1.310455 -0.557694 -0.100001
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021/08/27 19582 1.656060 1.539082 2.271937 2.151560 1.174091 0.778091 1.484776 1.743173 3.594865 1.533985 1.549169 -0.705340 1.646165 -0.178085 -1.055345 1.133390
2021/08/28 45456 1.780755 1.632663 2.442208 2.699594 1.484143 1.128198 -0.075720 0.000089 3.187087 -0.002058 1.549169 -0.705340 1.646165 -0.178085 -1.100312 -0.239105
2021/08/29 31871 1.692069 2.029458 2.325626 2.782625 1.308575 1.197393 0.060223 0.137999 2.877739 -0.921063 1.549169 -0.705340 1.646165 -0.178085 -1.079082 -0.199692
2021/08/30 3652 1.312763 1.266002 1.502044 1.133586 1.316112 0.847673 1.702709 1.957982 2.608230 1.558606 1.571202 -0.509248 1.703738 -0.361816 -1.041119 -0.505722
2021/08/31 8582 1.434144 1.282830 1.685858 1.606643 1.279019 0.879251 1.712669 1.997506 4.138569 1.394636 1.689138 -0.386843 1.748593 -0.262413 -0.977654 0.676665

974 rows × 17 columns

다변량 시계열 분석

#기본적인 데이터 형태 파악
df.plot(figsize=(20,20))
<matplotlib.axes._subplots.AxesSubplot at 0x22bb9c10f88>

df
ticketing_count ott_user_count ott_usage_time delivery_user_count delivery_usage_time used_user_count used_usage_time meeting_user_count meeting_usage_time corona_count subway_count KOSPI_index KOSPI_trading KOSDAQ_index KOSDAQ_trading coin_trading coin_variance
date
2019/01/01 7401 -1.301964 -1.126337 -1.138730 -0.879145 -1.332576 -1.373703 -1.053405 -0.843484 -0.607122 -1.546825 -0.857302 -1.152997 -0.801456 -1.333287 -0.760446 0.894595
2019/01/02 5069 -1.411287 -1.360638 -1.542213 -1.360824 -1.294828 -1.363849 -0.816423 -0.804730 -0.607122 0.758960 -0.857302 -1.152997 -0.801456 -1.333287 -0.473632 1.209898
2019/01/03 6498 -1.512255 -1.380048 -1.536694 -1.377820 -1.179579 -1.346215 -0.813926 -0.802878 -0.607122 0.908731 -0.892203 -0.903009 -0.886813 -1.155502 -0.653263 -0.830298
2019/01/04 7088 -1.343318 -1.318085 -1.434606 -1.294248 -1.309262 -1.362754 -0.819952 -0.804077 -0.607122 1.091537 -0.856767 -0.947702 -0.835184 -1.310455 -0.547525 0.442506
2019/01/05 18755 -1.010367 -0.963209 -1.174507 -0.999621 -1.319738 -1.330514 -1.046767 -0.836480 -0.607122 -0.153104 -0.856767 -0.947702 -0.835184 -1.310455 -0.557694 -0.100001
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021/08/27 19582 1.656060 1.539082 2.271937 2.151560 1.174091 0.778091 1.484776 1.743173 3.594865 1.533985 1.549169 -0.705340 1.646165 -0.178085 -1.055345 1.133390
2021/08/28 45456 1.780755 1.632663 2.442208 2.699594 1.484143 1.128198 -0.075720 0.000089 3.187087 -0.002058 1.549169 -0.705340 1.646165 -0.178085 -1.100312 -0.239105
2021/08/29 31871 1.692069 2.029458 2.325626 2.782625 1.308575 1.197393 0.060223 0.137999 2.877739 -0.921063 1.549169 -0.705340 1.646165 -0.178085 -1.079082 -0.199692
2021/08/30 3652 1.312763 1.266002 1.502044 1.133586 1.316112 0.847673 1.702709 1.957982 2.608230 1.558606 1.571202 -0.509248 1.703738 -0.361816 -1.041119 -0.505722
2021/08/31 8582 1.434144 1.282830 1.685858 1.606643 1.279019 0.879251 1.712669 1.997506 4.138569 1.394636 1.689138 -0.386843 1.748593 -0.262413 -0.977654 0.676665

974 rows × 17 columns

X = df.iloc[:,1:]
y = df.iloc[:,0]
y
date
2019/01/01     7401
2019/01/02     5069
2019/01/03     6498
2019/01/04     7088
2019/01/05    18755
              ...  
2021/08/27    19582
2021/08/28    45456
2021/08/29    31871
2021/08/30     3652
2021/08/31     8582
Name: ticketing_count, Length: 974, dtype: int64

변수 N개로 PCA 수행(최적값을 찾는 과정)

#n_components 수 변경하면서 시도
pca = PCA(n_components=2)
printcipalComponents = pca.fit_transform(X)
principalDf = pd.DataFrame(data=printcipalComponents, columns = ['p1',
                                                                 'p2'])
principalDf.head()
p1 p2
0 -3.472938 0.264457
1 -4.064765 -1.172083
2 -4.014694 -1.128189
3 -3.982567 -1.288346
4 -3.529418 -0.322702
#설명력 확인
pca.explained_variance_ratio_
array([0.63059992, 0.10080491])
sum(pca.explained_variance_ratio_)
0.7314048320656545
principalDf.index = df_date
principalDf
p1 p2
date
2019/01/01 -3.472938 0.264457
2019/01/02 -4.064765 -1.172083
2019/01/03 -4.014694 -1.128189
2019/01/04 -3.982567 -1.288346
2019/01/05 -3.529418 -0.322702
... ... ...
2021/08/27 5.117077 -3.194164
2021/08/28 4.866254 -1.415963
2021/08/29 5.011647 -0.716045
2021/08/30 4.346513 -3.065882
2021/08/31 5.077023 -3.389693

974 rows × 2 columns

변수 2개로 PCA 수행

pca = PCA(n_components=2)
printcipalComponents = pca.fit_transform(X)
principalDf = pd.DataFrame(data=printcipalComponents, columns = ['p1','p2'])
print(principalDf.head())
print(pca.explained_variance_ratio_)
print(sum(pca.explained_variance_ratio_))
         p1        p2
0 -3.472938  0.264457
1 -4.064765 -1.172083
2 -4.014694 -1.128189
3 -3.982567 -1.288346
4 -3.529418 -0.322702
[0.63059992 0.10080491]
0.7314048320656547
principalDf.index = df_date
principalDf
p1 p2
date
2019/01/01 -3.472938 0.264457
2019/01/02 -4.064765 -1.172083
2019/01/03 -4.014694 -1.128189
2019/01/04 -3.982567 -1.288346
2019/01/05 -3.529418 -0.322702
... ... ...
2021/08/27 5.117077 -3.194164
2021/08/28 4.866254 -1.415963
2021/08/29 5.011647 -0.716045
2021/08/30 4.346513 -3.065882
2021/08/31 5.077023 -3.389693

974 rows × 2 columns

df=principalDf

변수1개로 PCA수행

# pca = PCA(n_components=1)
# printcipalComponents = pca.fit_transform(X)
# principalDf = pd.DataFrame(data=printcipalComponents, columns = ['p1'])
# print(principalDf.head())
# print(pca.explained_variance_ratio_)
# print(sum(pca.explained_variance_ratio_))
         p1
0 -4.204660
1 -4.982673
2 -4.683025
3 -4.707338
4 -4.228135
[0.69251059]
0.6925105934278076
# principalDf.index = df_date
# principalDf
p1
date
2019/01/01 -4.204660
2019/01/02 -4.982673
2019/01/03 -4.683025
2019/01/04 -4.707338
2019/01/05 -4.228135
... ...
2021/06/26 4.773862
2021/06/27 5.004279
2021/06/28 4.413136
2021/06/29 4.775690
2021/06/30 4.673145

912 rows × 1 columns

#주성분분석된 데이터와 타겟데이터 병합
principalDf.index = df_date
df = pd.merge(y, principalDf,left_index=True, right_index=True,how='inner')
df
ticketing_count p1 p2
date
2019/01/01 7401 -3.472938 0.264457
2019/01/02 5069 -4.064765 -1.172083
2019/01/03 6498 -4.014694 -1.128189
2019/01/04 7088 -3.982567 -1.288346
2019/01/05 18755 -3.529418 -0.322702
... ... ... ...
2021/08/27 19582 5.117077 -3.194164
2021/08/28 45456 4.866254 -1.415963
2021/08/29 31871 5.011647 -0.716045
2021/08/30 3652 4.346513 -3.065882
2021/08/31 8582 5.077023 -3.389693

974 rows × 3 columns

y.hist()
<matplotlib.axes._subplots.AxesSubplot at 0x22bb9c1fa48>

df.plot(figsize=(20,20))
<matplotlib.axes._subplots.AxesSubplot at 0x22bbbabc788>

정상성확인

#컬럼별로 정상성확인하는 함수
for i in df.columns:
    adfuller_test = adfuller(df[i],autolag='AIC')
    print(i)
    print("ADF test statistic: {}".format(adfuller_test[0]))
    print("p-value: {}".format(adfuller_test[1]))
ticketing_count
ADF test statistic: -2.099553733500803
p-value: 0.24469408536639126
p1
ADF test statistic: -0.2126700871560426
p-value: 0.9369944290579174
p2
ADF test statistic: -0.9660338252729967
p-value: 0.76545027597895
#차분 구하기
df_diff = df.diff().dropna()
#차분 후 정상성 재확인
for i in df.columns:
    adfuller_test = adfuller(df_diff[i],autolag='AIC')
    print(i)
    print("ADF test statistic: {}".format(adfuller_test[0]))
    print("p-value: {}".format(adfuller_test[1]))
ticketing_count
ADF test statistic: -8.90366524755091
p-value: 1.1532103667357817e-14
p1
ADF test statistic: -7.316333641274674
p-value: 1.2265309592955346e-10
p2
ADF test statistic: -8.654863163034381
p-value: 5.0006129989254966e-14
#예매 건수, p1, p2 플롯 그리기
df_diff.plot(figsize=(20,20))
<matplotlib.axes._subplots.AxesSubplot at 0x22bb5836f48>

df_diff
ticketing_count p1 p2
date
2019/01/02 -2332.0 -0.591827 -1.436540
2019/01/03 1429.0 0.050071 0.043894
2019/01/04 590.0 0.032127 -0.160157
2019/01/05 11667.0 0.453149 0.965645
2019/01/06 -5564.0 0.215044 0.535168
... ... ... ...
2021/08/27 3970.0 0.335959 0.163582
2021/08/28 25874.0 -0.250823 1.778201
2021/08/29 -13585.0 0.145392 0.699918
2021/08/30 -28219.0 -0.665134 -2.349837
2021/08/31 4930.0 0.730510 -0.323811

973 rows × 3 columns

#최근30일 간의 데이터 예측 및 테스트를 위해 분리
train = df_diff.iloc[:-30,:]
test = df_diff.iloc[-30:,:]
train, test
(                  p1        p2
 date                          
 2019/01/02 -0.591827 -1.436540
 2019/01/03  0.050071  0.043894
 2019/01/04  0.032127 -0.160157
 2019/01/05  0.453149  0.965645
 2019/01/06  0.215044  0.535168
 ...              ...       ...
 2021/07/28  0.271976 -0.397341
 2021/07/29  0.121974  0.071939
 2021/07/30 -0.323097 -0.009642
 2021/07/31  0.806326  1.693124
 2021/08/01 -0.083921  0.568040
 
 [943 rows x 2 columns],
                   p1        p2
 date                          
 2021/08/02 -1.197897 -1.823121
 2021/08/03  0.382762 -0.307690
 2021/08/04  0.224371 -0.159779
 2021/08/05  0.213700  0.045172
 2021/08/06  0.210503  0.053557
 2021/08/07  0.433313  1.324466
 2021/08/08  0.227528  0.915187
 2021/08/09 -1.476812 -2.419552
 2021/08/10  0.864029 -0.313854
 2021/08/11 -0.361891  0.011769
 2021/08/12 -0.178059 -0.032996
 2021/08/13  0.349272  0.207221
 2021/08/14  0.313612  1.626783
 2021/08/15 -0.030712  0.621387
 2021/08/16 -0.492023 -0.229886
 2021/08/17 -0.320392 -2.102970
 2021/08/18  0.231022 -0.418104
 2021/08/19  0.186924  0.148053
 2021/08/20  0.140931  0.255090
 2021/08/21  0.576043  2.264528
 2021/08/22 -0.445505  0.305457
 2021/08/23 -0.563361 -2.491265
 2021/08/24  0.635171 -0.388304
 2021/08/25 -0.326547 -0.162770
 2021/08/26  0.101249  0.099862
 2021/08/27  0.335959  0.163582
 2021/08/28 -0.250823  1.778201
 2021/08/29  0.145392  0.699918
 2021/08/30 -0.665134 -2.349837
 2021/08/31  0.730510 -0.323811)
#VAR모델 선언 및 최적값을 찾기위해 AIC 확인
forecasting_model = VAR(train)
results_aic = []
for p in range(1,30):
  results = forecasting_model.fit(p)
  results_aic.append(results.aic)
C:\Users\USER\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:162: ValueWarning: No frequency information was provided, so inferred frequency D will be used.
  % freq, ValueWarning)
sns.set()
plt.plot(list(np.arange(1,30,1)), results_aic)
plt.xlabel("Order")
plt.ylabel("AIC")
plt.show()

results_aic
[15.874141457425885,
 15.556327191300296,
 15.426246152772501,
 15.31875092655693,
 14.48301589548955,
 13.814739595729133,
 13.673927482850303,
 13.665937797275639,
 13.663878430769731,
 13.67665193875356,
 13.692038993850531,
 13.67756467694419,
 13.524341978518088,
 13.490367613323487,
 13.499927177492392,
 13.512585793645396,
 13.529358364796668,
 13.542355917289484,
 13.53924779983039,
 13.496007803988674,
 13.470538806278913,
 13.488222722280227,
 13.49994838355099,
 13.511730950112858,
 13.521836125116637,
 13.538180084745054,
 13.52127881831538,
 13.519623176155903,
 13.531074508641476]
#최적의 AIC값을 나타내는 순서 인덱스 추출
np.argsort(results_aic)[0]
20
#모델 피팅
results = forecasting_model.fit(np.argsort(results_aic)[0])
results.summary()
  Summary of Regression Results   
==================================
Model:                         VAR
Method:                        OLS
Date:           Thu, 09, Sep, 2021
Time:                     03:49:21
--------------------------------------------------------------------
No. of Equations:         3.00000    BIC:                    14.4532
Nobs:                     923.000    HQIC:                   13.8612
Log likelihood:          -9974.45    FPE:                    726931.
AIC:                      13.4960    Det(Omega_mle):         599947.
--------------------------------------------------------------------
Results for equation ticketing_count
======================================================================================
                         coefficient       std. error           t-stat            prob
--------------------------------------------------------------------------------------
const                     145.510494       220.968015            0.659           0.510
L1.ticketing_count         -0.512800         0.034341          -14.933           0.000
L1.p1                   -1765.455677       822.840326           -2.146           0.032
L1.p2                   -2821.848574       496.649003           -5.682           0.000
L2.ticketing_count         -0.594128         0.038482          -15.439           0.000
L2.p1                   -1293.122466       849.927486           -1.521           0.128
L2.p2                    -608.933479       528.806515           -1.152           0.250
L3.ticketing_count         -0.398470         0.043336           -9.195           0.000
L3.p1                   -1339.687342       866.271487           -1.546           0.122
L3.p2                   -1050.299489       561.929041           -1.869           0.062
L4.ticketing_count         -0.261282         0.045265           -5.772           0.000
L4.p1                    -377.335131       885.985159           -0.426           0.670
L4.p2                   -1217.585151       583.640112           -2.086           0.037
L5.ticketing_count         -0.338885         0.046258           -7.326           0.000
L5.p1                   -1258.123212       891.089378           -1.412           0.158
L5.p2                    -396.262710       606.095860           -0.654           0.513
L6.ticketing_count         -0.104637         0.047671           -2.195           0.028
L6.p1                    -713.921745       904.037539           -0.790           0.430
L6.p2                    -946.466873       623.165218           -1.519           0.129
L7.ticketing_count          0.162614         0.047820            3.401           0.001
L7.p1                    -443.007072       907.350566           -0.488           0.625
L7.p2                    -931.608368       634.962091           -1.467           0.142
L8.ticketing_count          0.019806         0.047552            0.417           0.677
L8.p1                    -421.580373       916.131695           -0.460           0.645
L8.p2                     348.177749       632.506959            0.550           0.582
L9.ticketing_count         -0.042382         0.047322           -0.896           0.370
L9.p1                    -530.404782       916.362104           -0.579           0.563
L9.p2                    -340.607229       630.058520           -0.541           0.589
L10.ticketing_count        -0.102133         0.047123           -2.167           0.030
L10.p1                   -307.854688       916.870516           -0.336           0.737
L10.p2                   -395.121147       631.500453           -0.626           0.532
L11.ticketing_count        -0.173351         0.047131           -3.678           0.000
L11.p1                   -528.739883       916.463562           -0.577           0.564
L11.p2                    -80.394641       632.223254           -0.127           0.899
L12.ticketing_count        -0.160090         0.047484           -3.371           0.001
L12.p1                    285.141307       914.960571            0.312           0.755
L12.p2                   -915.651657       630.201786           -1.453           0.146
L13.ticketing_count        -0.244466         0.047791           -5.115           0.000
L13.p1                   -120.630017       915.508472           -0.132           0.895
L13.p2                    -94.051125       633.295991           -0.149           0.882
L14.ticketing_count         0.058981         0.048166            1.225           0.221
L14.p1                   -525.286961       911.156244           -0.577           0.564
L14.p2                   -203.047011       634.849183           -0.320           0.749
L15.ticketing_count        -0.028254         0.048063           -0.588           0.557
L15.p1                  -1117.715539       909.297358           -1.229           0.219
L15.p2                    381.515824       621.906509            0.613           0.540
L16.ticketing_count        -0.021476         0.046575           -0.461           0.645
L16.p1                    464.248680       902.623296            0.514           0.607
L16.p2                   -826.054787       603.737121           -1.368           0.171
L17.ticketing_count        -0.068757         0.045738           -1.503           0.133
L17.p1                  -1841.725987       890.483300           -2.068           0.039
L17.p2                   -324.034853       588.439473           -0.551           0.582
L18.ticketing_count        -0.133988         0.043636           -3.071           0.002
L18.p1                   -346.740276       873.988699           -0.397           0.692
L18.p2                   -567.606480       566.136161           -1.003           0.316
L19.ticketing_count        -0.129596         0.038708           -3.348           0.001
L19.p1                  -1022.793529       860.793492           -1.188           0.235
L19.p2                  -1064.689090       528.789370           -2.013           0.044
L20.ticketing_count        -0.155668         0.033569           -4.637           0.000
L20.p1                   -566.108373       830.916946           -0.681           0.496
L20.p2                   -518.094139       508.043627           -1.020           0.308
======================================================================================

Results for equation p1
======================================================================================
                         coefficient       std. error           t-stat            prob
--------------------------------------------------------------------------------------
const                       0.020579         0.009614            2.140           0.032
L1.ticketing_count          0.000000         0.000001            0.020           0.984
L1.p1                      -0.294970         0.035801           -8.239           0.000
L1.p2                      -0.121053         0.021609           -5.602           0.000
L2.ticketing_count         -0.000005         0.000002           -2.874           0.004
L2.p1                      -0.210368         0.036980           -5.689           0.000
L2.p2                      -0.031579         0.023008           -1.372           0.170
L3.ticketing_count         -0.000003         0.000002           -1.787           0.074
L3.p1                      -0.234719         0.037691           -6.227           0.000
L3.p2                      -0.052615         0.024449           -2.152           0.031
L4.ticketing_count         -0.000005         0.000002           -2.658           0.008
L4.p1                      -0.131175         0.038549           -3.403           0.001
L4.p2                      -0.072328         0.025394           -2.848           0.004
L5.ticketing_count         -0.000003         0.000002           -1.353           0.176
L5.p1                      -0.204783         0.038771           -5.282           0.000
L5.p2                      -0.023572         0.026371           -0.894           0.371
L6.ticketing_count         -0.000004         0.000002           -2.155           0.031
L6.p1                      -0.103657         0.039334           -2.635           0.008
L6.p2                      -0.052575         0.027114           -1.939           0.052
L7.ticketing_count         -0.000003         0.000002           -1.259           0.208
L7.p1                       0.158566         0.039478            4.017           0.000
L7.p2                       0.004891         0.027627            0.177           0.859
L8.ticketing_count         -0.000002         0.000002           -1.202           0.230
L8.p1                       0.003931         0.039860            0.099           0.921
L8.p2                      -0.003271         0.027520           -0.119           0.905
L9.ticketing_count         -0.000004         0.000002           -1.779           0.075
L9.p1                      -0.049405         0.039870           -1.239           0.215
L9.p2                      -0.019076         0.027414           -0.696           0.487
L10.ticketing_count        -0.000000         0.000002           -0.081           0.935
L10.p1                      0.020715         0.039893            0.519           0.604
L10.p2                     -0.047023         0.027476           -1.711           0.087
L11.ticketing_count        -0.000001         0.000002           -0.725           0.468
L11.p1                      0.029460         0.039875            0.739           0.460
L11.p2                     -0.025082         0.027508           -0.912           0.362
L12.ticketing_count        -0.000001         0.000002           -0.618           0.537
L12.p1                      0.040515         0.039810            1.018           0.309
L12.p2                     -0.042454         0.027420           -1.548           0.122
L13.ticketing_count        -0.000000         0.000002           -0.217           0.828
L13.p1                      0.012603         0.039833            0.316           0.752
L13.p2                     -0.047503         0.027554           -1.724           0.085
L14.ticketing_count        -0.000001         0.000002           -0.319           0.749
L14.p1                      0.109450         0.039644            2.761           0.006
L14.p2                      0.017832         0.027622            0.646           0.519
L15.ticketing_count         0.000002         0.000002            0.819           0.413
L15.p1                     -0.036218         0.039563           -0.915           0.360
L15.p2                     -0.008406         0.027059           -0.311           0.756
L16.ticketing_count         0.000002         0.000002            0.942           0.346
L16.p1                     -0.037590         0.039273           -0.957           0.338
L16.p2                     -0.045645         0.026268           -1.738           0.082
L17.ticketing_count         0.000000         0.000002            0.243           0.808
L17.p1                     -0.082901         0.038745           -2.140           0.032
L17.p2                     -0.009364         0.025603           -0.366           0.715
L18.ticketing_count        -0.000000         0.000002           -0.101           0.920
L18.p1                     -0.110970         0.038027           -2.918           0.004
L18.p2                     -0.031279         0.024632           -1.270           0.204
L19.ticketing_count        -0.000001         0.000002           -0.520           0.603
L19.p1                     -0.129667         0.037453           -3.462           0.001
L19.p2                     -0.012663         0.023007           -0.550           0.582
L20.ticketing_count        -0.000002         0.000001           -1.206           0.228
L20.p1                     -0.132579         0.036153           -3.667           0.000
L20.p2                     -0.018261         0.022105           -0.826           0.409
======================================================================================

Results for equation p2
======================================================================================
                         coefficient       std. error           t-stat            prob
--------------------------------------------------------------------------------------
const                      -0.000051         0.016017           -0.003           0.997
L1.ticketing_count          0.000002         0.000002            0.798           0.425
L1.p1                      -0.170507         0.059646           -2.859           0.004
L1.p2                      -0.391149         0.036001          -10.865           0.000
L2.ticketing_count         -0.000003         0.000003           -1.250           0.211
L2.p1                       0.061796         0.061609            1.003           0.316
L2.p2                      -0.424547         0.038332          -11.076           0.000
L3.ticketing_count         -0.000003         0.000003           -0.801           0.423
L3.p1                      -0.049023         0.062794           -0.781           0.435
L3.p2                      -0.361065         0.040733           -8.864           0.000
L4.ticketing_count         -0.000001         0.000003           -0.257           0.797
L4.p1                       0.012950         0.064223            0.202           0.840
L4.p2                      -0.375514         0.042307           -8.876           0.000
L5.ticketing_count         -0.000001         0.000003           -0.282           0.778
L5.p1                      -0.111906         0.064593           -1.732           0.083
L5.p2                      -0.304719         0.043934           -6.936           0.000
L6.ticketing_count         -0.000001         0.000003           -0.301           0.763
L6.p1                      -0.054229         0.065531           -0.828           0.408
L6.p2                      -0.264942         0.045172           -5.865           0.000
L7.ticketing_count          0.000000         0.000003            0.061           0.952
L7.p1                       0.146371         0.065772            2.225           0.026
L7.p2                       0.002069         0.046027            0.045           0.964
L8.ticketing_count         -0.000000         0.000003           -0.022           0.983
L8.p1                      -0.032427         0.066408           -0.488           0.625
L8.p2                      -0.075254         0.045849           -1.641           0.101
L9.ticketing_count          0.000002         0.000003            0.528           0.597
L9.p1                      -0.006389         0.066425           -0.096           0.923
L9.p2                      -0.144795         0.045671           -3.170           0.002
L10.ticketing_count         0.000002         0.000003            0.498           0.618
L10.p1                     -0.020082         0.066462           -0.302           0.763
L10.p2                     -0.085537         0.045776           -1.869           0.062
L11.ticketing_count        -0.000001         0.000003           -0.393           0.694
L11.p1                      0.103744         0.066432            1.562           0.118
L11.p2                     -0.112537         0.045828           -2.456           0.014
L12.ticketing_count        -0.000001         0.000003           -0.358           0.720
L12.p1                      0.057965         0.066323            0.874           0.382
L12.p2                     -0.162660         0.045682           -3.561           0.000
L13.ticketing_count        -0.000001         0.000003           -0.376           0.707
L13.p1                     -0.121140         0.066363           -1.825           0.068
L13.p2                     -0.096656         0.045906           -2.106           0.035
L14.ticketing_count        -0.000000         0.000003           -0.060           0.952
L14.p1                      0.115956         0.066047            1.756           0.079
L14.p2                      0.053468         0.046019            1.162           0.245
L15.ticketing_count        -0.000001         0.000003           -0.350           0.726
L15.p1                     -0.062024         0.065913           -0.941           0.347
L15.p2                     -0.018219         0.045080           -0.404           0.686
L16.ticketing_count         0.000000         0.000003            0.007           0.995
L16.p1                      0.001574         0.065429            0.024           0.981
L16.p2                     -0.152048         0.043763           -3.474           0.001
L17.ticketing_count        -0.000000         0.000003           -0.064           0.949
L17.p1                     -0.131831         0.064549           -2.042           0.041
L17.p2                     -0.096239         0.042655           -2.256           0.024
L18.ticketing_count        -0.000001         0.000003           -0.302           0.763
L18.p1                     -0.076352         0.063353           -1.205           0.228
L18.p2                     -0.141783         0.041038           -3.455           0.001
L19.ticketing_count        -0.000002         0.000003           -0.620           0.536
L19.p1                     -0.083997         0.062397           -1.346           0.178
L19.p2                     -0.153884         0.038331           -4.015           0.000
L20.ticketing_count         0.000000         0.000002            0.094           0.925
L20.p1                     -0.153089         0.060231           -2.542           0.011
L20.p2                     -0.098449         0.036827           -2.673           0.008
======================================================================================

Correlation matrix of residuals
                   ticketing_count        p1        p2
ticketing_count           1.000000  0.143784  0.180547
p1                        0.143784  1.000000  0.313019
p2                        0.180547  0.313019  1.000000
#차분 값에 대한 다변량 시계열 분석 진행
laaged_values = train.values
forecast = pd.DataFrame(results.forecast(y= laaged_values, steps=30), index = test.index,\
                        columns=df.columns)
forecast
ticketing_count p1 p2
date
2021/08/02 -21642.828443 -1.033055 -1.920281
2021/08/03 4108.724583 0.139701 -0.231206
2021/08/04 6880.360834 0.272476 -0.069083
2021/08/05 -2316.669338 0.105379 -0.046469
2021/08/06 3700.776681 0.056802 0.183212
2021/08/07 15584.686434 0.523810 1.435338
2021/08/08 -7120.885442 0.155960 0.662741
2021/08/09 -21598.560374 -1.056706 -1.925830
2021/08/10 6374.681066 0.093848 -0.235443
2021/08/11 4835.465686 0.167524 0.017505
2021/08/12 -1061.610716 0.108480 -0.038615
2021/08/13 3194.224751 0.057892 0.030914
2021/08/14 15751.719904 0.590367 1.493483
2021/08/15 -8752.597219 0.148820 0.582362
2021/08/16 -19415.434905 -0.958010 -1.771422
2021/08/17 4613.948348 0.059555 -0.313527
2021/08/18 6870.373231 0.149024 0.107671
2021/08/19 -2616.504109 0.089648 -0.079262
2021/08/20 3316.261843 0.029082 -0.012249
2021/08/21 15465.099089 0.534741 1.439276
2021/08/22 -8338.881634 0.123938 0.542337
2021/08/23 -19732.722397 -0.867167 -1.697932
2021/08/24 5293.242279 -0.005365 -0.337871
2021/08/25 5897.342281 0.144592 0.139256
2021/08/26 -2098.080570 0.064798 -0.095532
2021/08/27 3423.032635 0.060817 0.020882
2021/08/28 15996.419593 0.515684 1.370608
2021/08/29 -8996.299133 0.147419 0.538153
2021/08/30 -19438.048277 -0.827387 -1.619591
2021/08/31 5025.855072 -0.024818 -0.358857
#축적된 값을 더하여 실제 예측값 구하기(ticketing_count_forecasted가 예측값임)
for i in df.columns:
    forecast[f'{i}_forecasted']= df[i].iloc[-30-1]+forecast[i].cumsum()
print(forecast)
            ticketing_count        p1        p2  ticketing_count_forecasted  \
date                                                                          
2021/08/02    -21642.828443 -1.033055 -1.920281                 6203.171557   
2021/08/03      4108.724583  0.139701 -0.231206                10311.896141   
2021/08/04      6880.360834  0.272476 -0.069083                17192.256975   
2021/08/05     -2316.669338  0.105379 -0.046469                14875.587637   
2021/08/06      3700.776681  0.056802  0.183212                18576.364319   
2021/08/07     15584.686434  0.523810  1.435338                34161.050753   
2021/08/08     -7120.885442  0.155960  0.662741                27040.165311   
2021/08/09    -21598.560374 -1.056706 -1.925830                 5441.604936   
2021/08/10      6374.681066  0.093848 -0.235443                11816.286002   
2021/08/11      4835.465686  0.167524  0.017505                16651.751688   
2021/08/12     -1061.610716  0.108480 -0.038615                15590.140972   
2021/08/13      3194.224751  0.057892  0.030914                18784.365723   
2021/08/14     15751.719904  0.590367  1.493483                34536.085627   
2021/08/15     -8752.597219  0.148820  0.582362                25783.488407   
2021/08/16    -19415.434905 -0.958010 -1.771422                 6368.053503   
2021/08/17      4613.948348  0.059555 -0.313527                10982.001851   
2021/08/18      6870.373231  0.149024  0.107671                17852.375082   
2021/08/19     -2616.504109  0.089648 -0.079262                15235.870973   
2021/08/20      3316.261843  0.029082 -0.012249                18552.132817   
2021/08/21     15465.099089  0.534741  1.439276                34017.231906   
2021/08/22     -8338.881634  0.123938  0.542337                25678.350272   
2021/08/23    -19732.722397 -0.867167 -1.697932                 5945.627875   
2021/08/24      5293.242279 -0.005365 -0.337871                11238.870154   
2021/08/25      5897.342281  0.144592  0.139256                17136.212435   
2021/08/26     -2098.080570  0.064798 -0.095532                15038.131865   
2021/08/27      3423.032635  0.060817  0.020882                18461.164500   
2021/08/28     15996.419593  0.515684  1.370608                34457.584093   
2021/08/29     -8996.299133  0.147419  0.538153                25461.284960   
2021/08/30    -19438.048277 -0.827387 -1.619591                 6023.236683   
2021/08/31      5025.855072 -0.024818 -0.358857                11049.091755   

            p1_forecasted  p2_forecasted  
date                                      
2021/08/02       4.050834      -2.306268  
2021/08/03       4.190535      -2.537474  
2021/08/04       4.463011      -2.606557  
2021/08/05       4.568389      -2.653026  
2021/08/06       4.625191      -2.469813  
2021/08/07       5.149000      -1.034475  
2021/08/08       5.304960      -0.371734  
2021/08/09       4.248254      -2.297564  
2021/08/10       4.342102      -2.533007  
2021/08/11       4.509626      -2.515502  
2021/08/12       4.618106      -2.554117  
2021/08/13       4.675998      -2.523203  
2021/08/14       5.266365      -1.029720  
2021/08/15       5.415185      -0.447358  
2021/08/16       4.457175      -2.218780  
2021/08/17       4.516731      -2.532307  
2021/08/18       4.665754      -2.424636  
2021/08/19       4.755402      -2.503899  
2021/08/20       4.784485      -2.516147  
2021/08/21       5.319226      -1.076871  
2021/08/22       5.443164      -0.534534  
2021/08/23       4.575996      -2.232466  
2021/08/24       4.570632      -2.570337  
2021/08/25       4.715224      -2.431080  
2021/08/26       4.780021      -2.526612  
2021/08/27       4.840838      -2.505730  
2021/08/28       5.356523      -1.135123  
2021/08/29       5.503941      -0.596969  
2021/08/30       4.676555      -2.216561  
2021/08/31       4.651737      -2.575418  
df
ticketing_count p1 p2
date
2019/01/01 7401 -3.472938 0.264457
2019/01/02 5069 -4.064765 -1.172083
2019/01/03 6498 -4.014694 -1.128189
2019/01/04 7088 -3.982567 -1.288346
2019/01/05 18755 -3.529418 -0.322702
... ... ... ...
2021/08/27 19582 5.117077 -3.194164
2021/08/28 45456 4.866254 -1.415963
2021/08/29 31871 5.011647 -0.716045
2021/08/30 3652 4.346513 -3.065882
2021/08/31 8582 5.077023 -3.389693

974 rows × 3 columns

#예측값과 실제값 확인 (2021/08/02~2021/08/31 기간 내)
test = df.iloc[-30:,:1]
for i in test.columns:
    test[f'{i}_forecasted'] = forecast[f'{i}_forecasted']
test.plot(figsize=(20,20))
<matplotlib.axes._subplots.AxesSubplot at 0x22bb9b05e08>

# num01
# num02
# num04_1
# num04_2
# num04_3
# num05_1_Standard
# num05_1_MinMax
# num05_1_Robust
# num05_2_standard
# num05_2_MinMax
# num05_2_Robust
# num05_3_standard
# num05_3_MinMax
# num05_3_Robust
# num06_1_Standard
# num06_1_MinMax
# num06_1_Robust
# num06_2_standard
# num06_2_MinMax
# num06_2_Robust
# num06_3_standard
# num06_3_MinMax
# num06_3_Robust

# test = num05_2_standard
mse = mean_squared_error(test['ticketing_count'], test['ticketing_count_forecasted'])
rmse = np.sqrt(mse)

print(f'MSE: {mse}')
print(f'RMSE: {rmse}')
print('Variance score: {0:.3f}'.format(r2_score(test['ticketing_count'],
                                                test['ticketing_count_forecasted'])))
MSE: 13358737.129938863
RMSE: 3654.960619478528
Variance score: 0.890

모델 후보 성능 테스트

# metric_list=[num01,
# num02,
# num04_1,
# num04_2,
# num04_3,
# num05_1_Standard,
# num05_1_MinMax,
# num05_1_Robust,
# num05_2_standard,
# num05_2_MinMax,
# num05_2_Robust,
# num05_3_standard,
# num05_3_MinMax,
# num05_3_Robust,
# num06_1_Standard,
# num06_1_MinMax,
# num06_1_Robust,
# num06_2_standard,
# num06_2_MinMax,
# num06_2_Robust,
# num06_3_standard,
# num06_3_MinMax,
# num06_3_Robust]

# mse_list=[]
# rmse_list = []
# r2_score_list=[]

# for index,i in enumerate(metric_list):
#     print(index+1,'번')
#     if i['ticketing_count'][0]>=10:
#         test = i
        
#         print(test)
#         mse = mean_squared_error(test['ticketing_count'], 
#                                  test['ticketing_count_forecasted'])
#         rmse = np.sqrt(mse)

#         r2score = r2_score(test['ticketing_count'],test['ticketing_count_forecasted'])
#         print(f'MSE: {mse}')
#         print(f'RMSE: {rmse}')
#         print('Variance score: {0:.3f}'.format(r2score))
        
#     elif i['ticketing_count'][0]>1:
#         test = np.expm1(i)
#         print("inverse_log_scaled")
#         print(test)
#         mse = mean_squared_error(test['ticketing_count'], 
#                                  test['ticketing_count_forecasted'])
#         rmse = np.sqrt(mse)
#         r2score = r2_score(test['ticketing_count'],test['ticketing_count_forecasted'])
#         print(f'MSE: {mse}')
#         print(f'RMSE: {rmse}')
#         print('Variance score: {0:.3f}'.format(r2score))
    
#     mse_list.append(mse)
#     rmse_list.append(rmse)
#     r2_score_list.append(r2score)
# # for i in metric_list:
# #     print(i)
1 번
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16            15020                18464.274083
2021/06/17            17573                16945.966268
2021/06/18            21608                19220.695620
2021/06/19            51055                38581.320241
2021/06/20            35137                23195.149423
2021/06/21             3331                 2031.774542
2021/06/22            13000                10046.775567
2021/06/23            17698                20392.380196
2021/06/24            18357                17531.651535
2021/06/25            21268                21064.888283
2021/06/26            51912                34809.554088
2021/06/27            37135                21983.634169
2021/06/28             3911                 2672.880970
2021/06/29            10714                11837.351398
2021/06/30            19878                20764.617707
MSE: 57345736.70610123
RMSE: 7572.696792167321
Variance score: 0.725
2 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                13017.112639
2021/06/17          17573.0                11004.720728
2021/06/18          21608.0                16379.812459
2021/06/19          51055.0                27871.782599
2021/06/20          35137.0                20758.582156
2021/06/21           3331.0                 2308.044145
2021/06/22          13000.0                 9587.316723
2021/06/23          17698.0                12807.508619
2021/06/24          18357.0                10798.466674
2021/06/25          21268.0                15176.342051
2021/06/26          51912.0                25149.102386
2021/06/27          37135.0                20274.764900
2021/06/28           3911.0                 2545.187791
2021/06/29          10714.0                 9960.375378
2021/06/30          19878.0                12697.393160
MSE: 133603497.49971189
RMSE: 11558.697915410363
Variance score: 0.359
3 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                13292.238242
2021/06/17          17573.0                12494.135916
2021/06/18          21608.0                15631.354020
2021/06/19          51055.0                32684.420908
2021/06/20          35137.0                25629.448875
2021/06/21           3331.0                 2554.030814
2021/06/22          13000.0                10086.269064
2021/06/23          17698.0                12932.492305
2021/06/24          18357.0                12731.343143
2021/06/25          21268.0                16621.914562
2021/06/26          51912.0                33444.484249
2021/06/27          37135.0                25577.851681
2021/06/28           3911.0                 2532.232030
2021/06/29          10714.0                10279.283991
2021/06/30          19878.0                13411.301420
MSE: 73062313.6941993
RMSE: 8547.649600574376
Variance score: 0.649
4 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                16563.855678
2021/06/17          17573.0                11378.623517
2021/06/18          21608.0                18548.865687
2021/06/19          51055.0                35064.824591
2021/06/20          35137.0                23637.875640
2021/06/21           3331.0                 2749.690738
2021/06/22          13000.0                10099.466368
2021/06/23          17698.0                16242.414752
2021/06/24          18357.0                12169.400684
2021/06/25          21268.0                19020.100038
2021/06/26          51912.0                36203.586826
2021/06/27          37135.0                24920.772487
2021/06/28           3911.0                 2737.480839
2021/06/29          10714.0                 9916.291704
2021/06/30          19878.0                16811.251546
MSE: 59973125.16755498
RMSE: 7744.231735140354
Variance score: 0.712
5 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                16213.055215
2021/06/17          17573.0                10423.908786
2021/06/18          21608.0                19011.957765
2021/06/19          51055.0                35218.478770
2021/06/20          35137.0                21990.772804
2021/06/21           3331.0                 2677.276444
2021/06/22          13000.0                 9614.251002
2021/06/23          17698.0                14525.099908
2021/06/24          18357.0                11303.266231
2021/06/25          21268.0                18049.368090
2021/06/26          51912.0                35067.767963
2021/06/27          37135.0                23876.203606
2021/06/28           3911.0                 2581.887557
2021/06/29          10714.0                 9316.814739
2021/06/30          19878.0                14699.764609
MSE: 70334689.04451495
RMSE: 8386.57791023937
Variance score: 0.663
6 번
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16            15020                15224.609205
2021/06/17            17573                13806.418769
2021/06/18            21608                18131.488195
2021/06/19            51055                38375.911545
2021/06/20            35137                26591.700747
2021/06/21             3331                 4002.860504
2021/06/22            13000                10122.489030
2021/06/23            17698                15770.068258
2021/06/24            18357                13296.757206
2021/06/25            21268                18430.915817
2021/06/26            51912                37757.105260
2021/06/27            37135                26699.958404
2021/06/28             3911                 4511.393517
2021/06/29            10714                10947.512550
2021/06/30            19878                16094.247197
MSE: 42012198.087339
RMSE: 6481.68173295627
Variance score: 0.798
7 번
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16            15020                14840.682078
2021/06/17            17573                14073.468938
2021/06/18            21608                18016.853737
2021/06/19            51055                37959.047571
2021/06/20            35137                26634.563312
2021/06/21             3331                 4412.203000
2021/06/22            13000                10304.743301
2021/06/23            17698                15782.065770
2021/06/24            18357                13707.057674
2021/06/25            21268                18415.540609
2021/06/26            51912                37302.548955
2021/06/27            37135                26795.748701
2021/06/28             3911                 5104.656012
2021/06/29            10714                11349.829869
2021/06/30            19878                16120.161754
MSE: 43141328.29019242
RMSE: 6568.2058653937165
Variance score: 0.793
8 번
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16            15020                15145.473807
2021/06/17            17573                13412.517589
2021/06/18            21608                17894.891688
2021/06/19            51055                38254.831850
2021/06/20            35137                27139.436164
2021/06/21             3331                 3729.568660
2021/06/22            13000                 9717.499092
2021/06/23            17698                15212.989246
2021/06/24            18357                12855.183032
2021/06/25            21268                18595.247588
2021/06/26            51912                37632.568025
2021/06/27            37135                27502.800811
2021/06/28             3911                 4412.760699
2021/06/29            10714                10391.157028
2021/06/30            19878                15575.899080
MSE: 41932393.891585976
RMSE: 6475.522673235419
Variance score: 0.799
9 번
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16            15020                14070.808490
2021/06/17            17573                13047.439457
2021/06/18            21608                18018.433064
2021/06/19            51055                38981.207813
2021/06/20            35137                27247.462467
2021/06/21             3331                 3536.860409
2021/06/22            13000                 8927.900897
2021/06/23            17698                14604.147833
2021/06/24            18357                12448.377153
2021/06/25            21268                18623.693374
2021/06/26            51912                39121.181984
2021/06/27            37135                27431.680640
2021/06/28             3911                 3848.546292
2021/06/29            10714                 9721.664107
2021/06/30            19878                14756.151006
MSE: 39691319.08865397
RMSE: 6300.104688705892
Variance score: 0.810
10 번
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16            15020                14132.662477
2021/06/17            17573                13291.384048
2021/06/18            21608                17882.641611
2021/06/19            51055                38602.406088
2021/06/20            35137                27129.027589
2021/06/21             3331                 3961.158028
2021/06/22            13000                 9532.207351
2021/06/23            17698                14858.455654
2021/06/24            18357                12678.268413
2021/06/25            21268                18243.706646
2021/06/26            51912                38427.110563
2021/06/27            37135                27436.956769
2021/06/28             3911                 4505.308736
2021/06/29            10714                10335.661768
2021/06/30            19878                14990.670449
MSE: 40956618.24725466
RMSE: 6399.735795113315
Variance score: 0.803
11 번
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16            15020                15156.937449
2021/06/17            17573                13129.151004
2021/06/18            21608                18524.487245
2021/06/19            51055                37951.810664
2021/06/20            35137                26012.328397
2021/06/21             3331                 4006.885533
2021/06/22            13000                 9566.416674
2021/06/23            17698                14122.646466
2021/06/24            18357                12217.204477
2021/06/25            21268                18745.004728
2021/06/26            51912                37374.470372
2021/06/27            37135                26362.823361
2021/06/28             3911                 4892.938809
2021/06/29            10714                10321.786332
2021/06/30            19878                14557.458165
MSE: 47341708.009495184
RMSE: 6880.531084843319
Variance score: 0.773
12 번
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16            15020                13837.414435
2021/06/17            17573                13097.047142
2021/06/18            21608                18108.588813
2021/06/19            51055                38847.197236
2021/06/20            35137                27252.862353
2021/06/21             3331                 3356.940128
2021/06/22            13000                 8924.207741
2021/06/23            17698                14430.378944
2021/06/24            18357                12215.724432
2021/06/25            21268                18636.967015
2021/06/26            51912                38701.309126
2021/06/27            37135                27275.729405
2021/06/28             3911                 3712.518305
2021/06/29            10714                 9690.498727
2021/06/30            19878                14386.090379
MSE: 41318127.34276407
RMSE: 6427.917807716903
Variance score: 0.802
13 번
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16            15020                14341.971583
2021/06/17            17573                13730.493228
2021/06/18            21608                18228.750853
2021/06/19            51055                38647.857441
2021/06/20            35137                27125.907914
2021/06/21             3331                 4048.029937
2021/06/22            13000                 9616.272501
2021/06/23            17698                15306.952529
2021/06/24            18357                13239.510207
2021/06/25            21268                18887.276451
2021/06/26            51912                38058.852545
2021/06/27            37135                27312.959319
2021/06/28             3911                 4659.627158
2021/06/29            10714                10404.226059
2021/06/30            19878                15207.876143
MSE: 40342981.57154153
RMSE: 6351.612517427487
Variance score: 0.806
14 번
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16            15020                15113.939525
2021/06/17            17573                12923.164633
2021/06/18            21608                18441.080032
2021/06/19            51055                37866.669512
2021/06/20            35137                26188.709835
2021/06/21             3331                 3891.896020
2021/06/22            13000                 9579.287297
2021/06/23            17698                14369.289490
2021/06/24            18357                11945.508132
2021/06/25            21268                18440.935467
2021/06/26            51912                37610.225183
2021/06/27            37135                26282.371602
2021/06/28             3911                 4773.395837
2021/06/29            10714                10345.017250
2021/06/30            19878                14861.752131
MSE: 47081451.49804732
RMSE: 6861.592489943375
Variance score: 0.774
15 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                13292.238242
2021/06/17          17573.0                12494.135916
2021/06/18          21608.0                15631.354020
2021/06/19          51055.0                32684.420908
2021/06/20          35137.0                25629.448875
2021/06/21           3331.0                 2554.030814
2021/06/22          13000.0                10086.269064
2021/06/23          17698.0                12932.492305
2021/06/24          18357.0                12731.343143
2021/06/25          21268.0                16621.914562
2021/06/26          51912.0                33444.484249
2021/06/27          37135.0                25577.851681
2021/06/28           3911.0                 2532.232030
2021/06/29          10714.0                10279.283991
2021/06/30          19878.0                13411.301420
MSE: 73062313.69419986
RMSE: 8547.64960057441
Variance score: 0.649
16 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                13923.371405
2021/06/17          17573.0                12426.725437
2021/06/18          21608.0                16660.553159
2021/06/19          51055.0                34706.691283
2021/06/20          35137.0                26340.550596
2021/06/21           3331.0                 2711.352017
2021/06/22          13000.0                10005.255827
2021/06/23          17698.0                14088.134138
2021/06/24          18357.0                12846.068144
2021/06/25          21268.0                17364.449942
2021/06/26          51912.0                35629.915494
2021/06/27          37135.0                26312.935898
2021/06/28           3911.0                 2646.077998
2021/06/29          10714.0                10247.510935
2021/06/30          19878.0                14330.568623
MSE: 58641021.94179193
RMSE: 7657.742613968684
Variance score: 0.719
17 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                13295.727861
2021/06/17          17573.0                12380.622902
2021/06/18          21608.0                14980.833398
2021/06/19          51055.0                32703.665979
2021/06/20          35137.0                25243.435275
2021/06/21           3331.0                 2439.995226
2021/06/22          13000.0                10300.857872
2021/06/23          17698.0                12212.274941
2021/06/24          18357.0                12770.963761
2021/06/25          21268.0                16389.417815
2021/06/26          51912.0                32757.100473
2021/06/27          37135.0                25643.822895
2021/06/28           3911.0                 2432.327586
2021/06/29          10714.0                10226.380424
2021/06/30          19878.0                13203.012640
MSE: 76508076.35466559
RMSE: 8746.889524549031
Variance score: 0.633
18 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                12611.111640
2021/06/17          17573.0                12147.767887
2021/06/18          21608.0                15367.093878
2021/06/19          51055.0                32760.078836
2021/06/20          35137.0                24020.673126
2021/06/21           3331.0                 2519.830383
2021/06/22          13000.0                 9411.148568
2021/06/23          17698.0                11737.864976
2021/06/24          18357.0                12703.938195
2021/06/25          21268.0                15575.557371
2021/06/26          51912.0                32438.485622
2021/06/27          37135.0                24442.027805
2021/06/28           3911.0                 2415.786131
2021/06/29          10714.0                 9735.612207
2021/06/30          19878.0                12293.344823
MSE: 83128862.47926863
RMSE: 9117.50308358975
Variance score: 0.601
19 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                14022.225294
2021/06/17          17573.0                12239.907977
2021/06/18          21608.0                16660.636038
2021/06/19          51055.0                33922.605460
2021/06/20          35137.0                25846.782896
2021/06/21           3331.0                 2635.220186
2021/06/22          13000.0                 9890.133296
2021/06/23          17698.0                14195.277179
2021/06/24          18357.0                12577.053179
2021/06/25          21268.0                17076.179759
2021/06/26          51912.0                34479.595336
2021/06/27          37135.0                25674.610746
2021/06/28           3911.0                 2565.021312
2021/06/29          10714.0                10172.953917
2021/06/30          19878.0                14420.751112
MSE: 64950647.36962161
RMSE: 8059.196446893549
Variance score: 0.688
20 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                12883.252338
2021/06/17          17573.0                11998.962440
2021/06/18          21608.0                15389.200371
2021/06/19          51055.0                32856.953343
2021/06/20          35137.0                24667.688732
2021/06/21           3331.0                 2445.459782
2021/06/22          13000.0                11050.949144
2021/06/23          17698.0                11738.149050
2021/06/24          18357.0                12400.749436
2021/06/25          21268.0                16745.671546
2021/06/26          51912.0                34496.075774
2021/06/27          37135.0                25540.563466
2021/06/28           3911.0                 2420.769299
2021/06/29          10714.0                10514.491048
2021/06/30          19878.0                12397.479305
MSE: 73805318.03967038
RMSE: 8591.00215572493
Variance score: 0.646
21 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                14293.280092
2021/06/17          17573.0                11649.234506
2021/06/18          21608.0                15273.401915
2021/06/19          51055.0                36204.247431
2021/06/20          35137.0                21446.364670
2021/06/21           3331.0                 2694.504691
2021/06/22          13000.0                 9542.466815
2021/06/23          17698.0                12333.689540
2021/06/24          18357.0                12966.972880
2021/06/25          21268.0                15556.824946
2021/06/26          51912.0                33925.172621
2021/06/27          37135.0                23043.865029
2021/06/28           3911.0                 2456.393876
2021/06/29          10714.0                 9639.149278
2021/06/30          19878.0                13342.604126
MSE: 76973241.84661
RMSE: 8773.439567615998
Variance score: 0.631
22 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                13903.201980
2021/06/17          17573.0                12327.409530
2021/06/18          21608.0                16295.831595
2021/06/19          51055.0                35402.026565
2021/06/20          35137.0                23407.719156
2021/06/21           3331.0                 2823.028497
2021/06/22          13000.0                 9555.845179
2021/06/23          17698.0                13320.597696
2021/06/24          18357.0                13482.095165
2021/06/25          21268.0                16453.489664
2021/06/26          51912.0                34508.576301
2021/06/27          37135.0                24257.608517
2021/06/28           3911.0                 2631.018477
2021/06/29          10714.0                 9782.005924
2021/06/30          19878.0                13735.221561
MSE: 68449823.50867456
RMSE: 8273.440850617992
Variance score: 0.672
23 번
inverse_log_scaled
            ticketing_count  ticketing_count_forecasted
date                                                   
2021/06/16          15020.0                12526.087432
2021/06/17          17573.0                10726.710788
2021/06/18          21608.0                15422.778112
2021/06/19          51055.0                34202.537897
2021/06/20          35137.0                20639.571463
2021/06/21           3331.0                 2464.012863
2021/06/22          13000.0                10097.839720
2021/06/23          17698.0                10615.429118
2021/06/24          18357.0                11789.251687
2021/06/25          21268.0                15658.359920
2021/06/26          51912.0                33911.297993
2021/06/27          37135.0                23017.540695
2021/06/28           3911.0                 2351.660718
2021/06/29          10714.0                 9820.905112
2021/06/30          19878.0                11527.253991
MSE: 87717439.74389933
RMSE: 9365.758898450213
Variance score: 0.579
dict_data = {'mse': mse_list,
            'rmse': rmse_list,
            'r2_score': r2_score_list}
scores_df = pd.DataFrame(dict_data)
scores_df['r2_score'] = scores_df['r2_score'].round(3)
scores_df.to_csv('F:\\drive\\WebWorkPlace2021\\jupyter\\code\\다변량시계열예측모델평가점수.csv')

결론:

로그변환(x) + PCA(컴포넌트 2개) + 스탠다드스케일링을 거친 모델(m9)이 가장 좋은 성능을 보여줌

다음 시간에는 확정 모델(m9)로 다변량 시계열 예측 모델(VAR)을 만드는 과정을 알아볼게요

profile
코더가 아닌 프로그래머를 지향하는 개발자

0개의 댓글