IVML - pub_details


		about \| r&d \| publications \| courses \| people \| links

G. Vonitsanos, Ph. Mylonas, N. Antonopoulos, and A. Kanavos

A Human-AI Multimodal Framework for Emotion Recognition via Visual-Linguistic Alignment

International Conference on Human-AI Collaboration & Augmented Intelligence (HAICAI 2026), 23-24 Arpil 2026, Athens, Greece

ABSTRACT

The interpretation of human emotional states remains a challenge for intelligent systems designed to support Human–AI interaction. Many existing approaches analyse visual and linguistic emotional signals independently, limiting their ability to capture the multidimensional nature of affective expression. This paper presents a multimodal framework that integrates facial emotion recognition and linguistic emotion representation using visual features extracted with ResNet-50 and linguistic embeddings generated by the CLIP text encoder. Linguistic representations derived from the GoEmotions dataset are mapped to the FER+ emotion taxonomy to ensure cross-modal compatibility, while semantic relationships between visual and textual signals are modeled within the CLIP embedding space.

24 April , 2026

G. Vonitsanos, Ph. Mylonas, N. Antonopoulos, and A. Kanavos, "A Human-AI Multimodal Framework for Emotion Recognition via Visual-Linguistic Alignment", International Conference on Human-AI Collaboration & Augmented Intelligence (HAICAI 2026), 23-24 Arpil 2026, Athens, Greece

[

PDF] [ BibTex] [ Print] [

Back]