
Research Article
LAMB: Label-Induced Mixed-Level Blending for Multimodal Multi-label Emotion Detection
@INPROCEEDINGS{10.1007/978-3-031-54528-3_2, author={Shuwei Qian and Ming Guo and Zhicheng Fan and Mingcai Chen and Chongjun Wang}, title={LAMB: Label-Induced Mixed-Level Blending for Multimodal Multi-label Emotion Detection}, proceedings={Collaborative Computing: Networking, Applications and Worksharing. 19th EAI International Conference, CollaborateCom 2023, Corfu Island, Greece, October 4-6, 2023, Proceedings, Part II}, proceedings_a={COLLABORATECOM PART 2}, year={2024}, month={2}, keywords={multimodal fusion multi-label classification emotion detection}, doi={10.1007/978-3-031-54528-3_2} }
- Shuwei Qian
Ming Guo
Zhicheng Fan
Mingcai Chen
Chongjun Wang
Year: 2024
LAMB: Label-Induced Mixed-Level Blending for Multimodal Multi-label Emotion Detection
COLLABORATECOM PART 2
Springer
DOI: 10.1007/978-3-031-54528-3_2
Abstract
To better understand complex human emotions, there is growing interest in utilizing heterogeneous sensory data to detect multiple co-occurring emotions. However, existing studies have focused on extracting static information from each modality, while overlooking various interactions within and between modalities. Additionally, the label-to-modality and label-to-label dependencies still lack exploration. In this paper, we proposeLAbel-inducedMixed-levelBlending (LAMB) to address these challenges. Mixed-level blending leverages shallow but manifold self-attention and cross-attention encoders in parallel to model unimodal context dependency and cross-modal interaction simultaneously. This is in contrast to previous works either use one of them or cascade them successively, which ignores the diversity of interaction in multimodal data. LAMB also employs label-induced aggregation to allow different labels to attend to the most relevant blended tokens adaptively using a transformer-based decoder, which facilitates the exploration of label-to-modality dependency. Unlike common low-order strategies in multi-label learning, correlations among multiple labels can be learned by self-attention in label embedding space before being treated as queries. Comprehensive experiments demonstrate the effectiveness of our methods for multimodal multi-label emotion detection.