[논문 리뷰] (ImgCls) Prototypical Networks for Few-shot Learning
0. 사전 지식
- Bregman divergence (참고)
- 함수 F(x)에 대한 테일러 1차 근사(F′)와 실제값과의 차이
- DF(p,q)=F(p)−F(q)−<∇F(q),p−q>
- F′(p)=F(q)+∇F(p)(p−q)
- 에르미트 내적이 뭐죠
- ex.) Euclidean Norm -> Euclidean Distance, Entrophy -> KL-divergence,
- 아무튼 p,q 사이의 괴리 그런 것으로 이해
- (중요) p가 랜덤 벡터라면 이 식을 최소화 하는 q는 랜덤 p들의 평균이 된다.
- K-Nearest-Neighbor (KNN)
- 쿼리의 제일 가까운 k개의 점 중에서 다수결로 클래스를 정하는 방법
- Generative model vs. Discriminative model
1. Introduction
- Matching Networks(Vinyals et al.)
- embedding is learned(via neural network)
- weighted nearest-neighbor classifier
- meta-learning via LSTM(Ravi, Larochelle)
- learns to train a custom model for each episode
- very little data -> prone to overfit
- take advantage of this fact and assume that there exist a single embedding that represents each class("a prototype").
- this prototype is defined as an average of its support set embeddings.
- embeddings are generated by neural networks
- the importance of choosing a good metric
2. Prototypical Networks
- Prototype ck∈RM
- Embedding Function fϕ:RD→RM (ϕ is a learnable parameter)
- ck is an average of embedded support points in Sk (for class k)
- distance function d:RM×RM→[0,+∞)
- probablility that x is class k = softmax(−d(fϕ(x),ck))
- Objective: minimize J(ϕ)=−logpϕ(y=k∣x) for true class k
- Connections to Mixture Density Estimation
- If d is defined as a Bregman divergence such as Euclidean distance, Prototypical Network is equivalent to mixture density estimation. (~~)
3. Experiments
- Training Setting
- shot -> better to match test & training set
- 모델에 따라 다른 것 같다, 다른 논문에서는 shot 수를 다르게 한 것이 더 좋았다. 뭐가 달라서 이렇게 되는거지..
- way -> larger way for training gives better results
- way를 크게 할 수록 더 미세한 차이도 임베딩하도록 학습한다. 어려운 테스크로 훈련하면서 일반화가 잘 된다.
- Metric
- Euclidean distance is much more suitable for this model as prototypes are defined to be the average of the supporting points. Results are far superior compared to cosine similarity.

4. Opinion
- 관심이 가는 부분은 train/test 조건을 맞추는 것이 메타라는 것