Image-based 3D object Reconstruction: State-of-the-Art and Trends in the Deep Learning Era
2. Problem statement and Taxonomy
-
본 paper에서 사용되는 용어, 수식 정리
- I={Ik,k=1,...,n}, n≥1
: 하나 또는 그 이상의 객체 X 의 RGB 이미지
- fθ
: shape X′을 추론하는 predictor
: reconstruction objective L(I)=d(fθ(I),X)를 최소화 하는 함수
- θ
: a set of parameter of f
- d(•,•)
: target shape X와 reconstructed shape f(I) 의 차이를 측정하는 함수
- L
: reconstruction objective == loss funtion in the deep learning literature
이 서베이는 최신 방법들을 아래 다섯 카테고리로 나누고 설명한다.
- the nature of input I (입력 I의 성질)
- the representation of the output (출력 표현)
- the deep learning network architecture used during training and testing to approximate the predictor f (학습과 평가에 사용하는 딥러닝 네트워크 구조)
- the training procedure (학습 과정)
- the degree of supervision
- 1. Input
- a single image
- multiple images
- captured using RGB cameras whose intrinsic and extrinsic parameters can be known or unknown
- a video stream
- i.e., a sequence of images with temporal correlation
3. The encoding stage
3.1 Discrete latent spaces
3.2 Continuous latent spaces
3.3 Hierarchical latent spaces
3.4 Disentangled representation
4. Volumetric Decoding
4.1 Volumetric representations of 3D shapes
4.2 Low resolution 3D volume reconstruction
4.3 High resolution 3D volume reconstruction
4.4 Deep marching cubes
5. 3D Surface Decoding
5.1 Parameterization-based 3D reconstruction
5.2 Point-based techniques
6. Leveraging Other Cues
6.2 Exploiting spatio-temporal correlations
7. Training
7.1 Degree of supervision
7.2 Training with video supervision
7.3 Training procedure
8. Application and Special cases
8.1 3D human body reconstruction
8.2 3D face reconstruction
8.3 3D scene parsing
9. Datasets
11. Future research directions
12. Summary and Concluding remarks