Our expertise in multimedia signal processing covers digital image and video processing, speech and audiovisual signal processing, human-computer interaction and machine learning. We focus on various signal processing and statistical machine learning techniques to study correlations, dependencies and independicies across modalities, which are loosely synchronous such as speech and gestures, speech and emotion, or strongly synchronous such as acoustic and throat microphone recordings, music and dance.
Multi-modality being a key component refers to joint processing of signals from multiple modalities such as speech, still images, video, and other sensory sources. It plays a key role in the design of future human-computer interfaces and intelligent systems. The joint processing of multiple modalities derived from biometric signals, in particular from audiovisual sensors such as voice, face, fingerprint, iris, gestures, body and head motion, gait, speech, lip movements, reinforces recognition, provides robustness and yields more natural interaction with computers.
Research Areas: Video analysis, video compression and filtering, speech recognition, speech enhancement, human-computer interaction, body motion analysis, speech-driven facial and body animation, machine learning.