• About
    • Mission & History
    • Board of Directors
  • People
    • AI Committee
    • Affiliated Faculty
    • Students
    • Alumni
  • Research
    • Computer Vision
    • Computational Biology and Medicine
    • Human Computer Interaction
    • Machine Learning
    • Multimedia Signal Processing
    • Natural Language Processing
    • Robotics
    • Systems and AI
  • Projects
    • Blog
    • Sponsored Projects
    • Publications
  • Education
    • Courses and Educational Resources
    • Programs
  • Industry
    • Industrial Affiliates
    • How to engage
  • Resources
    • Data
    • Software
    • Hardware
  • Join Us
    • Graduate Admissions
    • Postdoc Positions
    • Faculty Positions
    • FAQ
  • About
  • Mission & History
  • Board of Directors
  • People
  • AI Committee
  • Affiliated Faculty
  • Students
  • Alumni
  • Research
  • Computer Vision
  • Computational Biology and Medicine
  • Human Computer Interaction
  • Machine Learning
  • Multimedia Signal Processing
  • Natural Language Processing
  • Robotics
  • Systems and AI
  • Projects
  • Blog
  • Sponsored Projects
  • Publications
  • Education
  • Courses and Educational Resources
  • Programs
  • Industry
  • Industrial Affiliates
  • How to engage
  • Resources
  • Data
  • Software
  • Hardware
  • Join Us
  • Graduate Admissions
  • Postdoc Positions
  • Faculty Positions
  • FAQ
Type the word you want to search for and press ENTER.
A-Z Content, Page Guide
ABC-ÇDEFGHI-İLMNPRS-ŞTU-Ü
A
AboutArabic-ALBERTA Prediction Framework for Fast Sparse Triangular SolvesAvailable Positions: Learning Rare Events in Autonomous DrivingA new single input multiple output deep temporal regression network (DTRN)AffectON: Incorporating Affect Into Dialog GenerationAll NewsAll EventsAll AnnouncementsArtificial IntelligenceAI FellowshipsAI CommitteeAffiliated FacultyAll BlogsApplicationAlumniAI Meetings
B
BeyondMoore: Pioneering a New Path in Parallel Programming Beyond Moore’s LawBoard of Directors
C-Ç
Courses and Educational ResourcesCan Learned Frame-Prediction Compete with Block-Motion Compensation for Video Coding?College of Engineering Outstanding Faculty AwardCyberphysical Blockchain-Enabled Peer-to-Peer Energy TradingCRAFT: A Benchmark for Causal Reasoning About Forces and InTeractionsCourses offeredComputer VisionComputational Biology and Medicine
D
DataDiagnostic Tools for Communication Pathologies in Parallel ArchitecturesDeep Learning for Image/Video Restoration and CompressionData AnalyticsDigital Signal ProcessingDL for Image/Video Nonlinear Signal ProcessingDL for BioinformaticsDisplaying Realistic Haptic Feedback on Touch Surfaces Using Machine/Deep LearningDistributed Systems Intelligence
E
EducationEmotion Dependent Domain Adaptation for Speech Driven Affective Facial Feature SynthesisEstimation of human force in physical human-robot interaction (pHRI) via machine/deep learningEducational Resources
F
Frequently Asked QuestionsFaculty Positions
G
Graduate ProgramsGraduate AdmissionsGrounded Language Understanding
H
HardwareHow to engageHuman Computer Interaction
I-İ
Industrial AffiliatesIndustryIntention Detection for Physical Human-Robot Interaction Using Machine/Deep Learning
L
Latest Computational Biology and Medicine PublicationsLatest Computer Vision PublicationsLatest Human-Computer Interaction PublicationsLatest Machine Learning PublicationsLatest Multimedia Signal Processing PublicationsLatest Natural Language Processing PublicationsLatest Robotics PublicationsLatest Systems and AI PublicationsLatest Multimodal Signal Processing PublicationsLatest Human Computer Interaction PublicationsLatest Systems & AI Publications
M
Mission & HistoryMultimedia Signal ProcessingMachine LearningMO-RL of SC for ManipulatorsMachine Learning for Multimodal and Intelligent User Interfaces
N
NewsNatural Language ProcessingNonequilibrium Physics and Machine Learning
P
ProgramsPostdoctoral ResearcherProfessional ProgramsProjectsPublicationsPositionsPostdoc PositionsPeoplePast ProjectsPast AI Meetings
R
ResourcesResearchReward Learning From Very Few DemonstrationsRoboticsResearch Highlights
S-Ş
SoftwareSparCity: Optimization and Co-design Framework for Sparse ComputationState-of-the-art Techniques for Deep Edge IntelligenceSample PageSupport LettersSystems and AISummer InternshipsStudents
T
The Road Less Travelled: One-Shot Learning of Rare Events in Autonomous Driving
U-Ü
Undergraduate Tracks
2
20202019201820172016
3
3D Perception for Creativity Assistance
HomeLatest Multimodal Signal Processing Publications
Latest Multimodal Signal Processing Publications
  • Publications
    • Computer Vision
    • Computational Biology & Medicine
    • Human Computer Interaction
    • Machine Learning
    • Multimedia Signal Processing
    • Natural Language Processing
    • Robotics
    • Systems And AI
  • Affective synthesis and animation of arm gestures from speech prosody Speech Communication
    E Bozkurt,Y Yemez,E ErzinHide
    2020

    More

    Abstract

    In human-to-human communication, speech signals carry rich emotional cues that are further emphasized by affect-expressive gestures. In this regard, automatic synthesis and animation of gestures accompanying affective verbal communication can help to create more naturalistic virtual agents in human-computer interaction systems. Speech-driven gesture synthesis can map emotional cues of the speech signal to affect-expressive gestures by modeling complex variability and timing relationships of speech and gesture. In ...

    View details for https://www.sciencedirect.com/science/article/pii/S0167639319301980

  • Automatic Vocal Tractlandmark Tracking in Rtmri Using Fully Convolutional Networks and Kalman Filter ICASSP 2020-2020 IEEE International Conference on Acoustics
    S Asadiabadi,E ErzinHide
    2020

    More

    Abstract

    Vocal tract (VT) contour detection in real time MRI is a pre-stage to many speech production related applications such as articulatory analysis and synthesis. In this work, we present an algorithm for robust detection of keypoints on the vocal tract in rtMRI sequences using fully convolutional networks (FCN) via a heatmap regression approach. We as well introduce a spatio-temporal stabilization scheme based on a combination of Principal Component Analysis (PCA) and Kalman filter (KF) to extract stable landmarks in space and time. The ...

    View details for https://ieeexplore.ieee.org/abstract/document/9054332/

  • Speech Driven Backchannel Generation using Deep Q-Network for Enhancing Engagement in Human-Robot Interaction NTERSPEECH: Annual Conference of the International Speech Communication Association
    N Hussain,E Erzin,TM Sezgin,Y YemezHide
    2019

    More

    Abstract

    We present a novel method for training a social robot to generate backchannels during human-robot interaction. We address the problem within an off-policy reinforcement learning framework, and show how a robot may learn to produce non-verbal backchannels like laughs, when trained to maximize the engagement and attention of the user. A major contribution of this work is the formulation of the problem as a Markov decision process (MDP) with states defined by the speech activity of the user and rewards generated by ...

    View details for https://arxiv.org/abs/1908.01618

  • Batch Recurrent Q-Learning for Backchannel Generation Towards Engaging Agents 8th International Conference on Affective Computing and Intelligent Interaction
    N Hussain,E Erzin,TM SezginHide
    2019

    More

    Abstract

    The ability to generate appropriate verbal and nonverbal backchannels by an agent during human-robot interaction greatly enhances the interaction experience. Backchannels are particularly important in applications like tutoring and counseling, which require constant attention and engagement of the user. We present here a method for training a robot for backchannel generation during a human-robot interaction within the reinforcement learning (RL) framework, with the goal of maintaining high engagement level. Since online learning ...

    View details for https://ieeexplore.ieee.org/abstract/document/8925443/

  • A New Interface for Affective State Estimation and Annotation from Speech 27th European Signal Processing Conference
    U Fidan,D Tomar,PG ÖzdilHide
    2019

    More

    Abstract

    Özetçe —Konusmadan duygu tanıma, yakın geçmiste önemli bir arastırma alanı olmustur kaydı, kayda karsılık gelen orijinal duygudurum etiketlenmelerinin ve duygudurum tahminlerinin kullanıcı Duygu durumu de˘gisimlerinin ani olmadı˘gını hesaba katarak daha büyük filtreler ...

    View details for https://ieeexplore.ieee.org/abstract/document/8806402/

  • Öğretmenlik uygulaması alan sınıf öğretmeni adaylarının sınıfta kullandıkları beden dilinin incelenmesi
    Ö SadioğluHide
    2018

    More

    Abstract

    Yeşil, 2005). İnsan doğası gereği iletişim kurarken beden dilini, sözel iletişimden daha yoğun olarak kullanmaktadır (Borg, 2009) iletişimi önemsemeleri gerekmektedir (Miller, 1988). Sınıf içi iletişimde öğretmenin davranışları Baltaş ve Baltaş (2005) yüz yüze ikili ...

    View details for https://www.researchgate.net/profile/Oemuer_Sadioglu/publication/327237763_ANALYZING_THE_BODY_LANGUAGES_OF_PRE-SERVICE_TEACHERS_DURING_TEACHING_PRACTICE_COURSE/links/5b83a6d9299bf1d5a72a6b12/ANALYZING-THE-BODY-LANGUAGES-OF-PRE-SERVICE-TEACHERS-DURING-TEACHING-PRACTICE-COURSE.pdf

  • A deep learning approach for data driven vocal tract area function estimation IEEE Workshop on Spoken Language Technology (SLT)
    S Asadiabadi,E ErzinHide
    2018

    More

    Abstract

    In this paper we present a data driven vocal tract area function (VTAF) estimation using Deep Neural Networks (DNN). We approach the VTAF estimation problem based on sequence to sequence learning neural networks, where regression over a sliding window is used to learn arbitrary non-linear one-to-many mapping from the input feature sequence to the target articulatory sequence. We propose two schemes for efficient estimation of the VTAF;(1) a direct estimation of the area function values and (2) an indirect estimation via ...

    View details for https://ieeexplore.ieee.org/abstract/document/8639582/

  • Multimodal speech driven facial shape animation using deep neural networks 10th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference
    S Asadiabadi,R Sadiq,E ErzinHide
    2018

    More

    Abstract

    In this paper we present a deep learning multimodal approach for speech driven generation of face animations. Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations. Unlike the previous approaches which either use acoustic features or phoneme label features to estimate the facial movements, we utilize both modalities to generate natural looking speaker independent lip animations synchronized with affective speech. A phoneme-based model qualifies ...

    View details for https://ieeexplore.ieee.org/abstract/document/8659713/

  • Monitoring Infant's Emotional Cry in Domestic Environments Using the Capsule Network Architecture. Interspeech
    MAT Turan,E ErzinHide
    2018

    More

    Abstract

    Automated recognition of an infant's cry from audio can be considered as a preliminary step for the applications like remote baby monitoring. In this paper, we implemented a recently introduced deep learning topology called capsule network (CapsNet) for the cry recognition problem. A capsule in the CapsNet, which is defined as a new representation, is a group of neurons whose activity vector represents the probability that the entity exists. Active capsules at one level make predictions, via transformation matrices, for the parameters of ...

    View details for https://www.isca-speech.org/archive/Interspeech_2018/pdfs/2187.pdf

  • Audio-Visual Prediction of Head-Nod and Turn-Taking Events in Dyadic Interactions. Interspeech
    BB Türker,E Erzin,Y Yemez,TM SezginHide
    2018

    More

    Abstract

    Head-nods and turn-taking both significantly contribute conversational dynamics in dyadic interactions. Timely prediction and use of these events is quite valuable for dialog management systems in human-robot interaction. In this study, we present an audio-visual prediction framework for the head-nod and turntaking events that can also be utilized in real-time systems. Prediction systems based on Support Vector Machines (SVM) and Long Short-Term Memory Recurrent Neural Networks (LSTMRNN) are trained on human-human ...

    View details for https://iui.ku.edu.tr/wp-content/uploads/2018/06/is2018_cameraReady.pdf

  • Detection of food intake events from throat microphone recordings using convolutional neural networks ICME workshop on Multimedia Services and Technologies for Smart-Health
    MAT Turan,E ErzinHide
    2018

    More

    Abstract

    Food intake analysis is a crucial step to develop an automated dietary monitoring system. Processing of eating sounds deliver important cues for the food intake monitoring. Recent studies on detection of eating activity generally utilize multimodal data from multiple sensors with conventional feature engineering techniques. In this study, we target to develop a methodology for detection of ingestion sounds, namely swallowing and chewing, from the recorded food intake sounds during a meal. Our methodology relies on feature learning in ...

    View details for https://ieeexplore.ieee.org/abstract/document/8551492/

  • Multifaceted engagement in social interaction with a machine: The joker project 13th IEEE International Conference on Automatic Face and Gesture Recognition
    L Devillers,S Rosset,GD DuplessisHide
    2018

    More

    Abstract

    This paper addresses the problem of evaluating engagement of the human participant by combining verbal and nonverbal behaviour along with contextual information. This study will be carried out through four different corpora. Four different systems designed to explore essential and complementary aspects of the JOKER system in terms of paralinguistic/linguistic inputs were used for the data collection. An annotation scheme dedicated to the labeling of verbal and non-verbal behavior have been designed. From our ...

    View details for https://ieeexplore.ieee.org/abstract/document/8373903/

  • On the importance of hidden bias and hidden entropy in representational efficiency of the Gaussian-Bipolar Restricted Boltzmann Machines Neural Networks
    A Isabekov,E ErzinHide
    2018

    More

    Abstract

    In this paper, we analyze the role of hidden bias in representational efficiency of the Gaussian-Bipolar Restricted Boltzmann Machines (GBPRBMs), which are similar to the widely used Gaussian-Bernoulli RBMs. Our experiments show that hidden bias plays an important role in shaping of the probability density function of the visible units. We define hidden entropy and propose it as a measure of representational efficiency of the model. By using this measure, we investigate the effect of hidden bias on the hidden entropy and ...

    View details for https://www.sciencedirect.com/science/article/pii/S0893608018301849

  • Audio-facial laughter detection in naturalistic dyadic conversations IEEE Transactions on Affective Computing
    BB Turker,Y Yemez,TM SezginHide
    2017

    More

    Abstract

    We address the problem of continuous laughter detection over audio-facial input streams obtained from naturalistic dyadic conversations. We first present meticulous annotation of laughters, cross-talks and environmental noise in an audio-facial database with explicit 3D facial mocap data. Using this annotated database, we rigorously investigate the utility of facial information, head movement and audio features for laughter detection. We identify a set of discriminative features using mutual information-based criteria, and show how they ...

    View details for https://ieeexplore.ieee.org/abstract/document/8046102/

  • The JESTKOD database: an affective multimodal database of dyadic interactions Language Resources and Evaluation
    E Bozkurt,H Khaki,S Keçeci,BB TürkerHide
    2017

    More

    Abstract

    In human-to-human communication, gesture and speech co-exist in time with a tight synchrony, and gestures are often utilized to complement or to emphasize speech. In human–computer interaction systems, natural, affective and believable use of gestures would be a valuable key component in adopting and emphasizing human-centered aspects. However, natural and affective multimodal data, for studying computational models of gesture and speech, is limited. In this study, we introduce the JESTKOD database, which consists of ...

    View details for https://link.springer.com/article/10.1007/s10579-016-9377-0

  • Empirical mode decomposition of throat microphone recordings for intake classification Second International Workshop on Multimedia for Personal Health and Health Care
    MAT Turan,E ErzinHide
    2017

    More

    Abstract

    Wearable sensor systems can deliver promising solutions to automatic monitoring of ingestive behavior. This study presents an on-body sensor system and related signal processing techniques to classify different types of food intake sounds. A piezoelectric throat microphone is used to capture food consumption sounds from the neck. The recorded signals are firstly segmented and decomposed using the empirical mode decomposition (EMD) analysis. EMD has been a widely implemented tool to analyze non-stationary and ...

    View details for https://dl.acm.org/doi/abs/10.1145/3132635.3132640

  • Vocal Tract Airway Tissue Boundary Tracking for rtMRI Using Shape and Appearance Priors. Interspeech
    S Asadiabadi,E ErzinHide
    2017

    More

    Abstract

    Knowledge about the dynamic shape of the vocal tract is the basis of many speech production applications such as, articulatory analysis, modeling and synthesis. Vocal tract airway tissue boundary segmentation in the mid-sagittal plane is necessary as an initial step for extraction of the cross-sectional area function. This segmentation problem is however challenging due to poor resolution of real-time speech MRI, grainy noise and the rapidly varying vocal tract shape. We present a novel approach to vocal tract airway tissue ...

    View details for https://www.isca-speech.org/archive/Interspeech_2017/pdfs/1016.PDF

  • Analysis of Engagement and User Experience with a Laughter Responsive Social Robot. Interspeech
    BB Türker,Z Buçinca,E Erzin,Y Yemez,TM SezginHide
    2017

    More

    Abstract

    We explore the effect of laughter perception and response in terms of engagement in human-robot interaction. We designed two distinct experiments in which the robot has two modes: laughter responsive and laughter non-responsive. In responsive mode, the robot detects laughter using a multimodal real-time laughter detection module and invokes laughter as a backchannel to users accordingly. In non-responsive mode, robot has no utilization of detection, thus provides no feedback. In the experimental design, we use a straightforward ...

    View details for https://188.166.204.102/archive/Interspeech_2017/pdfs/1395.PDF

  • Cross-Subject Continuous Emotion Recognition Using Speech and Body Motion in Dyadic Interactions. Interspeech
    SN Fatima,E ErzinHide
    2017

    More

    Abstract

    Dyadic interactions encapsulate rich emotional exchange between interlocutors suggesting a multimodal, cross-speaker and cross-dimensional continuous emotion dependency. This study explores the dynamic inter-attribute emotional dependency at the cross-subject level with implications to continuous emotion recognition based on speech and body motion cues. We propose a novel two-stage Gaussian Mixture Model mapping framework for the continuous emotion recognition problem. In the first stage, we perform continuous emotion ...

    View details for https://pdfs.semanticscholar.org/bc87/50893faa247bbdf5a0ce752cee8cf7a45b8e.pdf

  • Speech features for telemonitoring of Parkinson's disease symptoms 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society
    H Ramezani,H Khaki,E ErzinHide
    2017

    More

    Abstract

    The aim of this paper is tracking Parkinson's disease (PD) progression based on its symptoms on vocal system using Unified Parkinsons Disease Rating Scale (UPDRS). We utilize a standard speech signal feature set, which contains 6373 static features as functionals of low-level descriptor (LLD) contours, and select the most informative ones using the maximal relevance and minimal redundancy based on correlations (mRMR C) criteria. Then, we evaluate performance of Gaussian mixture regression (GMR) and support ...

    View details for https://ieeexplore.ieee.org/abstract/document/8037685/

  • Use of affect based interaction classification for continuous emotion tracking IEEE International Conference on Acoustics, Speech and Signal Processing
    H Khaki,E ErzinHide
    2017

    More

    Abstract

    Natural and affective handshakes of two participants define the course of dyadic interaction. Affective states of the participants are expected to be correlated with the nature of the dyadic interaction. In this paper, we extract two classes of the dyadic interaction based on temporal clustering of affective states. We use the k-means temporal clustering to define the interaction classes, and utilize support vector machine based classifier to estimate the interaction class types from multimodal, speech and motion, features. Then, we investigate ...

    View details for https://ieeexplore.ieee.org/abstract/document/7952683/

  • Affect recognition from lip articulations IEEE International Conference on Acoustics, Speech and Signal Processing
    R Sadiq,E ErzinHide
    2017

    More

    Abstract

    Lips deliver visually active clues for speech articulation. Affective states define how humans articulate speech; hence, they also change articulation of lip motion. In this paper, we investigate effect of phonetic classes for affect recognition from lip articulations. The affect recognition problem is formalized in discrete activation, valence and dominance attributes. We use the symmetric KullbackLeibler divergence (KLD) to rate phonetic classes with larger discrimination across different affective states. We perform experimental evaluations using ...

    View details for https://ieeexplore.ieee.org/abstract/document/7952593/

  • Multimodal analysis of speech and arm motion for prosody-driven synthesis of beat gestures Speech Communication
    E Bozkurt,Y Yemez,E ErzinHide
    2016

    More

    Abstract

    We propose a framework for joint analysis of speech prosody and arm motion towards automatic synthesis and realistic animation of beat gestures from speech prosody and rhythm. In the analysis stage, we first segment motion capture data and speech audio into gesture phrases and prosodic units via temporal clustering, and assign a class label to each resulting gesture phrase and prosodic unit. We then train a discrete hidden semi-Markov model (HSMM) over the segmented data, where gesture labels are hidden states with ...

    View details for https://www.sciencedirect.com/science/article/pii/S0167639315300170

  • Use of Agreement/Disagreement Classification in Dyadic Interactions for Continuous Emotion Recognition. Interspeech
    H Khaki,E ErzinHide
    2016

    More

    Abstract

    Natural and affective handshakes of two participants define the course of dyadic interaction. Affective states of the participants are expected to be correlated with the nature or type of the dyadic interaction. In this study, we investigate relationship between affective attributes and nature of dyadic interaction. In this investigation we use the JESTKOD database, which consists of speech and full-body motion capture data recordings for dyadic interactions under agreement and disagreement scenarios. The dataset also has affective annotations in ...

    View details for https://www.isca-speech.org/archive/Interspeech_2016/pdfs/0407.PDF

  • Agreement and disagreement classification of dyadic interactions using vocal and gestural cues IEEE International Conference on Acoustics
    H Khaki,E Bozkurt,E ErzinHide
    2016

    More

    Abstract

    In human-to-human communication gesture and speech co-exist in time with a tight synchrony, where we tend to use gestures to complement or to emphasize speech. In this study, we investigate roles of vocal and gestural cues to identify a dyadic interaction as agreement and disagreement. In this investigation we use the JESTKOD database, which consists of speech and full-body motion capture data recordings for dyadic interactions under agreement and disagreement scenarios. Spectral features of vocal channel and ...

    View details for https://ieeexplore.ieee.org/abstract/document/7472180/

  • A subjective listening test of six different artificial bandwidth extension approaches in English, Chinese, German, and Korean IEEE International Conference on Acoustics, Speech and Signal Processing
    J Abel,M Kaniewska,C GuillaumeHide
    2016

    More

    Abstract

    In studies on artificial bandwidth extension (ABE), there is a lack of international coordination in subjective tests between multiple methods and languages. Here we present the design of absolute category rating listening tests evaluating 12 ABE variants of six approaches in multiple languages, namely in American English, Chinese, German, and Korean. Since the number of ABE variants caused a higher-than-recommended length of the listening test, ABE variants were distributed into two separate listening tests per language ...

    View details for https://ieeexplore.ieee.org/abstract/document/7472812/

KUIS AI on Social Media
Download Our Mobile App
Contact

Rumelifeneri Yolu 34450
Sarıyer, İstanbul / Türkiye

ai-admissions@ku.edu.tr
Phone (central): 0212 338 1000 Fax: +90 212 338 1205
Access to Campuses and
Transportation Services
© 2021 Koç University
When you visit a website, information is stored in your browser, mostly in the form of cookies. This information may be about you, your preferences or your device, and is often used to make the site work as you expect it to. The information does not usually identify you directly, it is meant to provide you with a more personalized web experience. You can choose not to allow some cookies. Click on the different category headings to learn more and change our default settings. Cookie Notice
Necessary Cookies
These cookies are necessary for the website to function and cannot be turned off in our systems.
Statistical Cookies
These cookies are used to provide insight into how we can improve our service to all our users and to understand how you interact with our website as an anonymous user.
Targeting Cookies
These cookies are used to create your profile and provide ads relevant to your interests. It is also used to limit the number of times you see an ad, as well as help measure the effectiveness of the ad campaign.
Cookie Policy
Cookies are used to personalize content and ads, to provide social media features and to analyze our traffic. You can accept all cookies with the "Allow All" option or you can edit the settings with the "Customize Settings" option.
Customize Settings