Explaining Anomalies Detected by Autoencoders Using SHAP

Jay Han·2021년 4월 19일

Paper Review

목록 보기

1/1

Antwarg, Liat, et al. "Explaining anomalies detected by autoencoders using SHAP." arXiv preprint arXiv:1903.02407 (2019).

Input values와 Autoencoder를 통해 얻은 Reconstructed output values 사이의 차이 (Reconstruction Error)를 설명하는 방법을 제시
- 만약 Anomaly가 존재한다면 Reconstruction error가 높은 것이고, 어떠한 설명 모델을 이용하여 왜 높은지 설명해줘야 함
- 따라서 제안하는 방법은 Reconstructed feature의 SHAP values를 계산하고 이를 실제 Input values와 연관지음
Notations
- Given input instance $X$ with a set of features $x_1, x_2, \cdots, x_n$
- Corresponding output $X'$ and reconstructed values $x_1^\prime, x_2^\prime, \cdots, x_n^\prime$
- Autoencoder model $f$
- Reconstruction Error of the instance $L(X, X^\prime) = \sum^n_{i=1} (x_i - x_i^\prime)^2$
- A reordering of the features $x_{(1)}, x_{(2)}, \cdots, x_{(n)}$ which are in errorList such that $|x_{(1)} - x^\prime_{(1)}| \geq \cdots \geq |x_{(n)} - x^\prime_{(n)}|$
- topMfeatures = $x_{(1)}, \cdots, x_{(m)}$ contains a set of features for which the total corresponding errors topMerrors: $|x_{(1)} - x^\prime_{(1)}| \geq \cdots \geq |x_{(m)} - x^\prime_{(m)}|$ represent an adjustable percent of $L(X, X^\prime)$ .
논문에서 제안하는 방법은 topMFeatures 내에 높은 Reconstruction errors의 영향을 받은 Feature들을 SHAP value를 이용해 설명하는 것

Input
- $X$ : An instance we want to explain
- $X_1..j$ : Instances that kernel SHAP uses as background examples
- ErrorList: An ordered list of error per feature
- $f$ : Autoencoder model
Output
shaptopMfeatures : SHAP values for each feature within topMfeatures
Algorithm

topMfeatures <- top values from ErrorList

for each i in topMfeatures do

explainer <- shap.KernelExplainer(f, X1..j)

shaptopMfeatures[i] <- explainer.shap_values(X, i)

return shaptopMfeatures

Input
- shaptopMfeatures : SHAP values for each features
- $X$ : An instance we want to explain
- $X^\prime$ : The prediction for $X$
Output
- shapContribute, shapoffset
Algorithm

for each i in shaptopMfeatures do

    if $x_i > x_i^\prime$ then

        shapContribute[i] <- shaptopMfeatures[i] < 0

        shapOffset[i] <- shaptopMfeatures[i] > 0

    else

        shapContribute[i] <- shaptopMfeatures[i] > 0

        shapOffset[i] <- shaptopMfeatures[i] < 0

return shapContribute, shapOffset

논문을 아무리 보아도 Top-m Features를 선정하되, 전체 변수에 대해서도 충분히 유의미한 결과인지에 대한 내용을 찾기가 어려움
Anomaly를 설명하는 내용을 다루고 있기 때문에, 철저하게 inlier 값들은 배제하고 있음
관련한 Repository는 있음
- 라이브러리는 아니고 Notebook 형식으로 예제 다루듯 설명하고 있음
- https://github.com/liuyilin950623/SHAP_on_Autoencoder

Updates (April 26, 2021)

SHAP 라이브러리에 DeepExplainer를 활용하는 것이 더 나을 것으로 보임
- Top-m Features를 선정할 필요 없이 전체 변수에 대해서 계산할 수 있음

Machine Learning Engineer 8)