자세한 내용은 나중에 정리하자...
Internet of emotional people: Towards continual affective computing cross cultures via audiovisual signals
To cope with these issues, the most popular strategies are associated with Transfer Learning (TL) and Multi-Task Learning (MTL). With TL, a pre-trained model for a task with a rich resource can be reused for another recognition task with low resource. Thus, TL can well address the data scarcity problem, and relax the task-dependent assumption to a certain extent. Nevertheless, it is still particularly designed for improving the target task, and suffers a serious catastrophic forgetting — a phenomenon that the model endures an abrupt performance decrease or, in the worst case, being completely overwritten by the new task.
In comparison with TL, MTL learns several tasks by one model simultaneously. Therefore, it cannot only largely reduce the number of models, but also efficiently exploit the shared information among various tasks. However, in the training stage, it requires a large size of training data, and consequently needs large storage memory size and heavy computational load. Even worse, when facing a new task, the model has to be re-trained from scratch, which significantly reduces its flexibility.
An efficient model-level fusion approach for continuous affect recognition from audiovisual signals
Due to the natural complementarity and redundancy of multiple modal information or the power of the implicit ensemble method[26], multi-modal continuous affect recognition methods have been considered in the literature
Continuous audiovisual emotion recognition using feature selection and lstm
Mehrabian argued that feelings and attitudes from speech communication comprise 38% of vocal/audio information and 55% of visual expression.
ocmbining both audio and visual information will enhance the performance of recognizing and communicating emotion
MULTITASK LEARNING AND MULTISTAGE FUSION FOR DIMENSIONAL AUDIOVISUAL EMOTION RECOGNITION
자체적으로 멀티 모달을 통한 실험이 유니모달보다 좋음을 입증하는 논문
Branch-Fusion-Net for Multi-Modal Continuous Dimensional Emotion Recognition
The emotional information provided by different modal signals can complement each other. The research of Han et al. also shows that the information of different modalities can be converted to each other. Therefore, multimodal signals can provide emotional information more stably and multimodal systems perform better than monomodal systems