[DL] MGFN 논문 리뷰

YJ·2023년 8월 28일

모델 변경 이유

Deep MIL로 진행하려 했지만 C3D 모델인거 같아서 UCF-DATASET을 이용한 SOTA 모델(I3D)로 변경

📖 MGFN

Magnitude-Contrastive Glance-and-Focus Network for Weakly-Supervised Video Anomaly Detection

reference: https://arxiv.org/pdf/2211.15098.pdf

(참고) Weak Supervision이란?

It is characterized by using a combination of a small amount of human-labeled data (exclusively used in more expensive and time-consuming supervised learning paradigm), followed by a large amount of unlabeled data (used exclusively in unsupervised learning paradigm).

출처: wikipedia

📖 모델 구조

Magnitude-Contrastive Glance-and-Focus Network(MGFN)

긴 영상의 전체 비디오 sequence를 살펴본 다음(전반적인 상황에 대한 context information 추출) 이상 탐지를 위한 각 특정 부분을 추가로 처리한다.

특징 추출기(input: untrimmed videos with video-level annotation)

비디오를 클립으로 분할하여 feature map 표시

Feature Amplification Mechanism(FAM)

input: feature map
특징의 크기를 계산(calculates the feature norm)
비정상을 나타내는 feature norm을 통합해서 feature map을 확장
정상과 비정상 feature를 구분할 수 있도록 특징 학습 강화, MC loss를 강화하여 특징 사이의 분리 능력 향상

Glance Block

정상 케이스가 어떤지 알게 해줌으로써 이상 행동을 탐지할 수 있게 도와준다.
- short-cut convolution을 사용해서 feature map의 차원 축소
- VCT(Video clip-level transformer)
  비디오 클립에서 전체적인 상관관계를 파악
  - attention map: correlate the different temporal clips
  - soft max normalization
- VCT output: weighted average of all clips
- Feed-Forward Network(FFN): two fully-connedted layers, GeLU non linear function

Focus Block: integrate the global and local features

구성: short-cut convolution(SCC), self-attentional convolution(SAC), Feed-Forward-Network(FFN)
- 채널 수 증가하여 SCC에서 feature map 생성
- SAC: 각각의 비디오 클립에서 feature learning 강화
  주변 채널에 접근해서 가중치 없이도 상관관계 학습을 가능하게 함

Magnitude Contrastive Loss

비디오 프레임 간의 차이를 감지
learn a scene-adaptive cross-video magnitude distribution.
정상과 비정상의 특징을 분리
다른 장면에서 이상행동의 특징을 정상보다 많아지게 하지 않고
적절한 분포로 분리할 수 있게 한다.
based on the top-k-largest-feature-magnitude clips

Overall loss functions

temporal smoothness loss, sparsity loss
regulations to smooth the predicted scores of adjacent video clips

요약

untrimmed videos with video-level annotation
feature extractor -> 클립별로 영상 균등 분할 -> feature map
Feature Amplification: 특징의 크기 계산하여 통합
Glance Block, Focus Block: global context info 추출(주요 프레임 식별) 및 local feature 강화(중요한 정보에 초점)
MC loss: 같은 카테고리의 feature magnitude distances는 줄이고,
top-k feature magnitudes를 사용하여 카테고리별 차이는 크게

[DL] MGFN 논문 리뷰

모델 변경 이유

📖 MGFN

(참고) Weak Supervision이란?

📖 모델 구조

Magnitude-Contrastive Glance-and-Focus Network(MGFN)

특징 추출기(input: untrimmed videos with video-level annotation)

Feature Amplification Mechanism(FAM)

Glance Block

Focus Block: integrate the global and local features

Magnitude Contrastive Loss

Overall loss functions

요약

[DL] Deep MIL

[DL] MGFN_1

0개의 댓글