Point Net — 3D point clouds bounding box detection and tracking (PointNet, PointNet++, LaserNet, Point Pillars and Complex Yolo) — Series 5 (Part 2) 리뷰

Sam Kim·2022년 9월 7일

[원문 번역]

(직접 번역한 내용이며 부족한 배경지식으로인한 오역 등이 있을 수 있음)

지난 Part 1 에서 3D point cloud bounding box detection이 얼마나 효과적이고 해당 주제에 주요 연구 분야는 무엇인지 확인했다. 이번 글에서는 3D point cloud를 데이터 변환 없이도 딥러닝에서 활용할 수 있도록 한 선구적인 업적을 세운 Point Net(by Charles R. Qi. et. al.)에 대해 다룬다.

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

[논문 링크]

Input (입력)

PointNet은 포인트 클라우드의 산재하는 각 점들의 x, y, z 좌표값들을 종단간 학습(end to end learning)에 적용한다. 이를 통해 분류, 구역 분할, 객체 분류에 활용할 수 있다. 이런 이유로 PointNet에서 Input은 3차원 벡터이다.

Properties of point clouds (포인트 클라우드의 특성)

PointNet network를 구성할 때 특정 특성을 신중하게 고려해야 전처리 없이 포인트 클라우드 데이터셋에 바로 적용할 수 있다.

아래에서 언급하는 '불변성'이란 결과가 변하지 않는다는 뜻으로 이해하는 것이 적합하다.

Permutation Invariance (치환 불가, 순열의 불변성)

데이터셋 내의 점들의 순서는 불변성을 가지고 있다. 예를 들면, 데이터셋에 3 점이 있다고 가정할 때, 어떤 순서로 입력되었는지는 아무런 상관이 없다. 그렇기 때문에 순열의 불변성을 고려해 대칭식(변량의 모든 치환(置換)(permutation)에 대하여 불변의 함수)으로 max-pooling 방식을 사용한다.

The basic structure of PointNet. MLP (Multi-Layer Perceptron) input is fed to a max-pool layer to achieve permutation invariance. [이미지 출처]

Geometric Transformation Invariance (기하학적 변환 불변성)

포인트 클라우드는 회전, 이동, 부등각 사상변환(affine transformations) 등과 같은 기하학적 변환 불변성을 갖는다.
기하학적 변환을 위해 PointNet은 입력 점들을 특정 축에 맞춰 회전하거나 이동시켜서 고차원 표준 매개변수 공간(High Dimensional canonical parameter space)에 정렬한다.
예를 들어 아래의 그림을 보면 (Spatial Transformation) ST-FCN networks를 통해 변환된 이미지가 어떻게 같은 결과를 내는지를 확인할 수 있다.

Rotation Invariance example. [이미지 출처]

회전 불변성 위해 PointNet에서는 회귀망으로 T-Net를 사용하여 변환 행렬을 예측한다. T-Net은 데이터 종속적 특성을 학습된 독립적 특성과 결합하여 변환 행렬을 생성한다.

Geometric Transformation Invariance in PointNet. [이미지 출처]

Final Architecture combining permutation and geometric transformation invariance schemes. (순열과 기하학적 변환 불변성 체계를 결합한 최종 결과)

PointNet architecture. Combinations of Max-Pool (Permutation Invariance) and Input Transformations (Geometric Transformation Invariance)

순열과 기하학적 변환 불변성을 결합하여 PointNet은 별도의 전처리 없이도 포인트 클라우드 데이터를 바로 처리하는 것이 가능하다.
구역 분할을 위해서는 각 점들의 class scores 예측이 필요하다. 이 때문에 로컬 내장설계(local embeddings)는 segmentation network의 글로벌 특성 벡터(lobal feature vectors)와 합쳐진다.

PointNet의 정확도는 사용자가 어느 정도 규모로 글로벌 특성 벡터(lobal feature vectors)를 설정하는 지에 따라 달라진다. 특성의 수가 많아질 수록 정확도도 높아진다.

PointNet은 Class label 생성에 1024개의 특성을 사용한다.

Accuracy of PointNet vs Global Feature Vector Size. [이미지 출처]

Sam Kim

이전 포스트

3D point clouds bounding box detection and tracking (PointNet, PointNet++, LaserNet, Point Pillars and Complex Yolo) — Series 5 (Part 1) 리뷰

다음 포스트

PointNet++ — 3D point clouds bounding box detection and tracking (PointNet, PointNet++, LaserNet, Point Pillars and Complex Yolo) — Series 5 (Part 3) 리뷰