Video Super-Resolution Project

Video Super-Resolution aims to give a satisfying estimation of a high resolution image from multiple similar low resolution images.


We learn music-to-dance mappings to generate plausible music-driven dance choreographies.

Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

We reason over three frames to strengthen the photometric loss and explicitly reason about occlusions in an unsupervised framework.

Sketch Recognition with Few Examples

Our systems perform self-learning by automatically extending a very small set of labeled examples with new examples extracted from unlabeled sketches.

Identifying Visual Attributes for Object Recognition from Text and Taxonomy

Our evaluations demonstrate that both taxonomy and distributional similarity serve as useful sources of information for attribute nomination, and our methods can effectively exploit them.

Audio-Facial Laughter Detection
in Naturalistic Dyadic Conversations

Our experiments show that our multimodal
approach supported by bagging compares favorably to the state of the art in presence of detrimental factors such as cross-talk,
environmental noise, and data imbalance.

Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data

We propose a challenging real-world dataset with reference flow data by tracking pixels through densely sampled space-time volumes recorded with a high-speed video camera.

Exploiting Object Similarity in 3D Reconstruction

We take advantage of the similarity in 3D shape of commonly observed objects in outdoor scenes by locating objects using detectors and jointly reconstructing them while learning a volumetric model of their shape.

Displets: Resolving Stereo Ambiguities using Object Knowledge

Stereo techniques cannot easily recover reflecting and textureless surfaces by using traditional local regularizers. We propose to regularize over larger distances using object-category specific disparity proposals (displets) which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image.

Engaging Conversational Robots

We train a social robot using Reinforcement Learning to enhance user engagement by generating appropriate smiles, laughs, and nodes during a conversation.

Deep Stroke-based Sketched Symbol Reconstruction and Segmentation

We propose a neural network model that segments symbols into stroke-level components. Our segmentation framework has two main elements: a fixed feature extractor and a Multilayer Perceptron (MLP) network that identifies a component based on the feature

Active Learning for Sketch Recognition

Our results imply that the Margin based informativeness measure consistently outperforms other measures. We also show that active learning brings definitive advantages in challenging databases when accompanied with powerful feature representations.

Semantic Sketch-Based Video
Retrieval with Autocompletion

The system indexes collection data with over 30 visual features describing color,
edge, motion, and semantic information. Resulting feature data is stored in ADAM, an efficient database system
optimized for fast retrieval.

IMOTION – Searching for Video Sequences
Using Multi-Shot Sketch Queries

This paper presents the second version of the IMOTION
system, a sketch-based video retrieval engine supporting multiple query
paradigms. For the second version, the functionality and the usability of the system have been improved.

iAutoMotion – an Autonomous
Content-based Video Retrieval Engine

This paper introduces iAutoMotion, an autonomous video
retrieval system that requires only minimal user input. It is based on the
video retrieval engine IMOTION.

The ASC-Inclusion Perceptual Serious Gaming
Platform for Autistic Children

Often, individuals with an Autism Spectrum Condition (ASC) have difficulties in interpreting verbal and non-verbal communication cues during social interactions. We develop a platform for children who have an ASC to learn emotion expression and recognition, through play in the virtual world.

DPFrag: Trainable Stroke Fragmentation Based on Dynamic Programming

DPFrag is an efficient, globally optimal fragmentation method that learns segmentation parameters from data and produces fragmentations by combining primitive recognizers in a dynamic-programming framework. The fragmentation is fast and doesn’t require laborious and tedious parameter tuning. In experiments, it beat state-of-the-art methods on standard databases with only a handful of labeled examples.