lee23l@interspeech_2023@ISCA

Total: 1

#1 Video Multimodal Emotion Recognition System for Real World Applications [PDF] [Copy] [Kimi1]

Authors: Sun-Kyung Lee ; Jong-Hwan Kim

This paper proposes a system capable of recognizing a speaker's utterance-level emotion through multimodal cues in a video. The system seamlessly integrates multiple AI models to first extract and pre-process multimodal information from the raw video input. Next, an end-to-end MER model sequentially predicts the speaker's emotions at the utterance level. Additionally, users can interactively demonstrate the system through the implemented interface.