DeepAR

Sungchul Kim·2022년 1월 11일

DeepAR Time Series

Summary

Probabilistic forecasting이란 과거 데이터의 분포를 통해, 미래의 probability distribution을 예측하는 방법이다. → demand forecasting에서 많이 사용됨.

본 연구에서는 Auto-regressive recurrent network model을 기반으로 한 probability model인 DeepAR을 제안함.

시계열 데이터로부터 global model을 학습
여러 도메인의 데이터에 대해서 좋은 결과를 산출

Contribution

Probabilistic forecasting model로 LSTM architecture사용
제안한 모델을 가지고 여러 데이터셋에서 실험 및 검증

Goal

P(\mathcal{z}_{i,t_{0}:T}|\mathcal{z}_{i,1:t_{0}-1}, \mathcal{x}_{i,1:T}) = \Pi_{t=t_{0}}^{T} \ell(\mathcal{z}_{i,t}|\theta(h_{i, t}, \Theta))

과거시점 $\mathcal{z}_{i,1:t_{0}-1}$ 을 통해 미래시점의 $\mathcal{z}_{i,t_{0}:T}$ 를 예측하는것 → $\mathcal{z}_{i,1:t{0}-1}$ 로부터 $\mathcal{z}_{i,t{0}:T}$ 의 probability를 예측
- 1시간을 입력으로 받아 미래 30분 예측
- Condition range : $[1, t_{0}-1]$ → $[1, 59]$
- Prediction range : $[t_{0}, T]$ → $[60, 90]$

Method

Notation

Time step(특정 시점을 의미) : $t$

Covariates(index, hour, weekday, month의 z-score값으로 이루어진 matrix) : $\mathcal{x}_{i,t}$

Target(특정 시점의 $\mu$ 를 의미) : $\mathcal{z}_{i,t}$

Output(특정 시점의 output을 의미) : $h_{i,t}$

과거시점(Past) / condition range : $[1, t_{0}-1]$

미래(Future) / prediction range : $[t_{0}, \ T]$

Train

각각의 step을 $t$ , input의 covariate를 $\mathcal{x}_{i,t}$ , target을 $\mathcal{z}_{i,t}$ 라 하자.

이전 시점의 target, 현재 시점의 covariates, 이전 시점의 output이 network를 → 현재 시점의 output

output : $h_{i, t} = h(h_{i, t-1}, z_{i, t-1}, \mathcal{x}_{i, t}, \Theta)$ → $t$ 시점의 $\mu$ , $\sigma$
$\ell(z_{i, t}| \theta_{i,t}) = \ell(z_{i, t}| \theta(h_{i,t}))$ : output의 distribution으로부터 target( $\mathcal{z}_{i,t}$ )에 대해 likelihood를 구함

$h_{i, 0}, h_{i, 1} \ ...\ h_{i, t_{0}-2}$ 를 거쳐 최종적으로 얻은 $h_{i, t_{0}-1}$ 를 decoder의 initial state로 사용한다.

→ Encoder의 initial state인 $h_{i, 0}$ , $z_{i, 0}$ 은 0

Predict

prediction 과정은 아래와 같습니다.

현재 시점을 $t$ 라 가정, $t < t_{0}$ 일때 $\mathcal{z}_{i,t}$

(predictionrange) : $t ≥ t{0}$ 일때, $\mathcal{\hat{z}}_{i,t} \sim \ell(\cdot|\theta_{i, t})$

Prediction

학습할때 최종적으로 얻은 $h_{i, t_{0}-1}$ 를 decoder의 initial state, $z_{i, t_{0}-1}$ 도 동일.

즉, $Q_{\Theta}(\mathcal{z}_{i,t_{0}:T}|\mathcal{z}_{i,1:t_{0}-1}, \mathcal{x}_{i,1:T})$ 로 부터 $\tilde z_{i, t_0:T}$ (prediction range의 target)를 sampling한다.
(Data는 real-value형태 이기 때문에, gaussian likelihood로부터 sampling)