Data Augmentation for Scene Text Recognition

sshinohs·2022년 12월 11일

0

Abstract

Scene Text Recognition (STR)은 text appearance가 매우 다양할 수 있기 때문에 어려운 작업임.
실제 데이터가 없어서 대부분 합성 데이터에 의존
실제 나타날 수 있는 현상들을 흉내냄
성능은 모델에 따라 0.89 ~ 2.10% 상승
소스 코드 제공됨
- https://github.com/roatienza/straug

Introduction

기존 연구는 model architecture, training algorithm을 개선함
데이터로도 품질 향상 시킬 수 있음
Synthetic data
- MJSynth
- Synth90k
- SynthText
- Verisimilar
- UnrealText
Real data
- IIIT5K
- Street View Text
- ICDAR
- SVT Perspective
- CUTE80
데이터들이 파편화되어 있어서, distribution shift를 겪음
test data에 long tail 데이터가 존재하게 됨

Data Augmentation for STR

Typical Augmentation
- Gaussian Noise
- Motion Blur
- Resizing, Padding
- ...
논문에서 제안하는 Augmentation
- Warp
- Geometry
- Noise
- Blur
- Weather
- Camera
- Pattern
- Process

Wrap

Geometry

Pattern

Noise

Blur

Weather

Camera

Process

Experimental Results and Discussion

Train set으로는 합성 이미지만 사용하고, Test set으로는 실제 이미지 사용함.
Train set
- MJSynth
  - 8.9M 개
  - 변화 준 것
    - 폰트
    - 배경
    - 그림자
    - 경계
- SynthText
  - 5.5M 개
  - 변화 준 것
  - 실제 이미지와 합성 이미지를 섞음.
  - geometry
  - texture
  - distort
  - crop
Test Set
- Regular Set
  - Minimal amount of rotation or perspective distortion
- Irregular Set
  - Curve
  - Vertical
  - Under perspective transformation
  - Low-resolution or distorted
ICDAR15는 모든 aug에 대해 성능이 상승
2 ~ 3 중첩이 좋은 경향을 나타냄

sshinohs

이전 포스트

책: MLFlow를 활용한 MLOps

다음 포스트

Data Augmentation for Scene Text Recognition

0개의 댓글