2022, End-to-End Audio-Visual Neural Speaker Diarization [2022, Interspeech]

DongKeon Park·2023년 5월 17일
0

Figure

Abstract

  • multimodal inputs
    • uses audio features, lip regions of interest, and i-vector embeddings
  • I-vectors are the key point to solve the alignment problem caused by visual modality errors
    • e.g., occlusions, off-screen speakers, or unreliable detection
  • Our audio-visual model is robust to the absence of visual modality, where the diarization performance degrades significantly using the visual-only model
  • It is robust to visual modality errors and outperforms audio-only and video-only systems

Introudction

  • exploring the effects of lip motion and speech on speaker diarization using high-definition lip ROIs and single-channel audios
  • By manually removing lip ROI fragments, we can compare the impact of different degrees of lip misalignment on speaker diarization.
profile
Currently pursuing my Ph.D. in GIST, I am deeply intrigued by the field of speaker diarization and committed to making meaningful contributions to it.

1개의 댓글

comment-user-thumbnail
2023년 12월 20일

From the very core only up online of my being, I want to express my heartfelt thanks for your extraordinary generosity, both in material ways and in the warmth and love you have shown me.
only up

답글 달기