[Paper Review] EPIC-SURVIVAL: End-to-end part inferred clustering for survival analysis, featuring prognostic stratification boosting

JaeHeon Lee, 이재헌·2022년 4월 29일

Digital Pathology paper-review ssl survival analysis

Paper Review

목록 보기

10/60

EPIC-SURVIVAL: End-to-end part inferred clustering for survival analysis, featuring prognostic stratification boosting

Introduction

Survival problem을 해결하는 기존 방식은 encoding stage와 aggregation stage로 나뉜 two stage framework의 형태를 띄고 있음.
이 방식은 tile encoding 과정에서의 정보가 slide level로 효율적으로 통합되기 어렵기에 end-to-end training 방식을 개발함.
또한, 기존 DeepSurv에서 사용하던 loss 이외에도 high risk group과 low risk group 간의 격차를 늘리는 stratification boosting loss도 추가하여 성능을 개선함.

Intrahepatic cholangiocarcinoma (ICC) 는 bile duct, 쓸개관에서 생성되는 암임.
기존 ICC를 대상으로 한 연구의 performance는 서로 inconsistent하거나 unsuccessful 했음.

Method

Architecture

EPIC-Survival은 DeepSurv loss와 이전에 소개된 EPL의 framework를 합친 모델임.

EPL의 링크는 다음과 같음. https://openreview.net/forum?id=aqOfnZx4-N
Loss를 제외한 대부분이 EPIC-survival 과 유사함.

순서대로 이 framework에 대해 소개해보도록 하겠다.
1) 가장 왼쪽 slide 한장으로부터 tile 여러장을 randomly sample한다.
2) Tile을 pretrained ResNet 을 거쳐 tile feature로 만든다.
3) Slide 당 뽑혔던 모든 tile 들에 대해, 기존에 존재하던 centroid 와의 거리를 계산하고, 가장 가까운 centroid에 해당하는 group으로 assign 한다.

이 때 MSE를 계산한다.
띠 한 줄이 slide 내에 존재하는 tile group 하나이다.
Random initialize 된 centroid 는 mean vector로 update 된다.

4) 이후 생성된 여러 줄의 띠가 포함된 한 장 (무지개떡 슬라이스)과 prognosis data를 통해 NLPL loss를 계산한다.
5) 마지막으로 median을 기준으로 각 slide의 risk score를 high / low로 나눈 뒤 stratification boosting loss를 계산한다.

Base feature extractor: ImageNet pretrained ResNet-34
Metric: C index and log rank test between groups

Loss Function

이 EPL model에 DeepSurv loss를 합친 loss를 사용하여 survival prediction 정보를 neural network에 update함.

오른쪽에 있는 항에서, z는 랜덤하게 sample된 tile feature, c는 각 histology part의 centroid이다.
즉 얼마나 계산된 centroid로부터 멀리 떨어져 있는지를 loss에 포함시킨 것.

Huber loss를 이용해 high group의 mean risk score와 low group의 mean risk score 의 간극을 infinite에 가까워지도록 설정된 loss임.
앞서 언급했던 NLPL, MSE, smoothL1 을 모두 더한 것이 최종 loss로 작동함.

Dataset

MSKCC와 EMC 로부터 246 slides - training data, 5 fold cross validation
UC 로부터 19 slides - external held out test set
Tumor-regions of tissue 로부터 224 x 224 pixel, 10x resolution sampling

Results

Stratification boosting을 적용한 EPIC-survival model이 test set에서 0.880의 C-index.
Batch effect인지를 확인하기 위해 large cohort, external set에 대해서도 수행, 다른 연구결과와 수치가 비슷한 것을 확인.
Small dataset의 problem set이 쉽기에 c-index가 잘 나오는 경향이 있음.

KM estimation analysis:

with stratification boosting은 LRT p<0.05
without stratifiaction boosting 결과는 failed on the held out test set.

Time-to-event의 distribution에 대한 predicted risk의 relative distribution 에서, early recurrence 예측 성능이 좋았음.

Conclusion

selecting image patches: 각 clustering의 centroid와 image patch 사이의 거리를 loss에 반영함으로써 representation learning을 수행함과 동시에, DeepSurv loss에 high risk group와 low risk group과의 격차를 벌리는 새로운 loss function을 반영함으로써 (작은 dataset일지라도) 높은 c-index를 기록함. (0.88)

Small dataset에서 진행했더라도 0.88은 높은 수치임. (처음 봄)
단순히 high group과 small group 과의 차이를 늘리는 loss를 추가해 성능을 개선함.
독특한 architecture로 cost가 클 것으로 예상됨. (epoch 하나만 해도 …)
Architecture에 대한 구체적인 설명이 부족한 것이.. 아쉬웠음.

JaeHeon Lee, 이재헌

https://jaeheon-lee486.github.io/

이전 포스트

[Paper Review] Colorectal Cancer Outcome Prediction from H&E Whole Slide Images using Machine Learning and Automatically Inferred Phenotype Profiles

다음 포스트