Multimodal Human Communication Dynamics

Research Area(s): Multimodal Computing and Interaction
This research focuses on building the computational foundations to enable computers with the abilities to analyze, recognize and predict subtle human communicative behaviors during social interactions.

Louis-Philippe Morency

This research combines the fields of multimodal interaction, social psychology, computer vision, machine learning and artificial intelligence, and has many applications in areas as diverse as medicine, robotics and education.

• Human Communication Dynamics
• Analyze, recognize and predict subtle human communicative behaviors during social interactions.
• Multimodal Machine Learning
• Probabilistic modeling of acoustic, visual and verbal modalities
• Learning the temporal contingency between modalities
• Health Behavior Informatics
• Technologies to support clinical practice during diagnosis and treatment of mental health disorders

Case Study: Multimodal Sentiment Intensity Analysis

Authors: Amir Zadeh, Louis-Philippe Morency

People share their opinions, stories, and reviews through online video sharing websites every day. The automatic analysis of these online opinion videos is bringing new or understudied research challenges to the field of computational linguistics and multimodal analysis. Among these challenges is the fundamental question of exploiting the dynamics between visual gestures and verbal messages to be able to better model sentiment. This article addresses this question in four ways: introducing the first multimodal dataset with opinion-level sentiment intensity annotations; studying the prototypical interaction patterns between facial gestures and spoken words when inferring sentiment intensity; proposing a new computational representation, called multimodal dictionary, based on a language-gesture study; and evaluating the authors' proposed approach in a speaker-independent paradigm for sentiment intensity prediction. The authors' study identifies four interaction types between facial gestures and verbal content: neutral, emphasizer, positive, and negative interactions. Experiments show statistically significant improvement when using multimodal dictionary representation over the conventional early fusion representation (that is, feature concatenation).

For More Information:
Zadeh, Amir, et al. "MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos." arXiv preprint arXiv:1606.06259 (2016).

Case Study: Facial Behavior Analysis

Authors: Tadas Baltrušaitis, Louis-Philippe Morency 

Over the past few years, there has been an increased interest in automatic facial behavior analysis and understanding. We present OpenFace – an open source tool intended for computer vision and machine learning researchers, affective computing community and people interested in building interactive applications based on facial behavior analysis. 

OpenFace is the first open source tool capable of facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation. The computer vision algorithms which represent the core of OpenFace demonstrate state-of-the-art results in all of the above mentioned tasks. Furthermore, our tool is capable of real-time performance and is able to run from a simple webcam without any specialist hardware.


Code  PDF  Project Page

For More Information:
Baltrušaitis, Tadas, Peter Robinson, and Louis-Philippe Morency. "Openface: an open source facial behavior analysis toolkit." Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 2016.

Author: Sponsor: