An Entanglement-driven Fusion Neural Network for Video Sentiment Analysis

#1 An Entanglement-driven Fusion Neural Network for Video Sentiment Analysis [PDF] [Copy] [Kimi] [REL]

Authors: Dimitris Gkoumas, Qiuchi Li, Yijun Yu, Dawei Song

Video data is multimodal in its nature, where an utterance can involve linguistic, visual and acoustic information. Therefore, a key challenge for video sentiment analysis is how to combine different modalities for sentiment recognition effectively. The latest neural network approaches achieve state-of-the-art performance, but they neglect to a large degree of how humans understand and reason about sentiment states. By contrast, recent advances in quantum probabilistic neural models have achieved comparable performance to the state-of-the-art, yet with better transparency and increased level of interpretability. However, the existing quantum-inspired models treat quantum states as either a classical mixture or as a separable tensor product across modalities, without triggering their interactions in a way that they are correlated or non-separable (i.e., entangled). This means that the current models have not fully exploited the expressive power of quantum probabilities. To fill this gap, we propose a transparent quantum probabilistic neural model. The model induces different modalities to interact in such a way that they may not be separable, encoding crossmodal information in the form of non-classical correlations. Comprehensive evaluation on two benchmarking datasets for video sentiment analysis shows that the model achieves significant performance improvement. We also show that the degree of non-separability between modalities optimizes the post-hoc interpretability.

Subject: IJCAI.2021 - Humans and AI

239@2021@IJCAI

#1 An Entanglement-driven Fusion Neural Network for Video Sentiment Analysis [PDF] [Copy] [Kimi] [REL]