=> not enough to acheive rich representation including mutual information between different modalities.
Proposed more effective learning scheme => Two-phase training, selective updating algorithm
Main Goal : Improving performance in multi-label classfication task (especially image-text hashtag prediction) rather than using single predictor as well as, when one of the modalities does not realted to ground-truth.
Pretrained VGG16(ImageNet) - image feature
Pretrained Word2Vec(Google News corpus) - text feature
previous work have proven that the weighted average of word embedding can strongly represent sentences. ~ 이하 문장 궁금?
First phase: the cross-active connections 가 deactivated 될 때, loss function 계산이 안되므로, temporary use 용으로 virtual sigmoid output을 만듬
Passive subsection : update 시 freezing 되어있는 아이
우리가 만든 CAC dataset에 대한 실험은 표 4