Affective Computing에서 Silent 처리

꼼댕이·2023년 10월 10일

Affective Computing

목록 보기

13/13

An efficient model-level fusion approach for continuous affect recognition from audiovisual signals
When dealing with natural face scenes, illumination and large head pose, affect the facial appearance of a person in an image. The silence of the audio data and the failure of face detection also severely affect the quality of the extracted features for emotion recognition. Hence, in our framework we use the extracted features, as side information to adapt the weights of the used networks. With the help of side information and adaptive weights, we hope to disentangle the variations related to the side information, and extract discriminative features from the current data stream

Audio–Visual Fusion for Emotion Recognition in the Valence–Arousal Space Using Joint Cross-Attention
Finally, mean and variance normalization is performed on the spectrogram.
Apart from mean and variance normalization, no other voice specific processing such as silence removal, noise filtering, etc are performed
A Visual–Audio-Based Emotion Recognition System Integrating Dimensional Analysis
The lower part of the graph is the audio path. For this part, the extracted audio stream is preprocessed such as noise reduction and silence removal, and then the preprocessed audio stream is sent to the feature extraction module

Robust Audiovisual Emotion Recognition: Aligning Modalities, Capturing Temporal Information, and Handling Missing Features
To handle this situation, we want to explore the use of voice activity detectors, signal to noise ratio filters, and music detection filters to identify inputs which can be regarded as noise, silence, or music. We could analyze if better results can be achieved by removing these problematic inputs from the model’s pipeline at inference, and replacing these segments with zeros

사람을 연구하는 공돌이