2020
A Recurrent Variational Autoencoder for Speech Enhancement Proceedings Article
In: IEEE International Conference on Audio, Speech and Signal Processing, 2020.
GMM-UNIT: Unsupervised Multi-Domain and Multi-Modal Image-to-Image Translation via Attribute Gaussian Mixture Modeling Unpublished
2020.
Describe What to Change: A Text-guided Unsupervised Image-to-image Translation Approach Proceedings Article
In: ACM International Conference on Multimedia, 2020.
Robust Unsupervised Audio-visual Speech Enhancement Using a Mixture of Variational Autoencoders Proceedings Article
In: IEEE International Conference on Audio, Speech and Signal Processing, Barcelona, Spain, 2020.
Audio-visual Speech Enhancement Using Conditional Variational Auto-Encoders Journal Article
In: IEEE Transactions on Audio, Language and Signal Processing, 2020.
Mixture of Inference Networks for VAE-based Audio-visual Speech Enhancement Journal Article
In: IEEE Transactions on Signal Processing, no. 69, pp. 1899-1909, 2020.
Learning How to Smile: Expression Video Generation with Conditional Adversarial Recurrent Nets Journal Article
In: IEEE Transactions on Multimedia, vol. 22, no. 11, pp. 2808–2819, 2020.
How to Train Your Deep Multi-Object Tracker Proceedings Article
In: IEEE International Conference on Computer Vision and Pattern Recognition, Seatle,USA, 2020.
Probabilistic Graph Attention Network with Conditional Kernels for Pixel-Wise Prediction Journal Article
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
Towards Probabilistic Generative Models for Socially Intelligent Robot thesis
2020.
2019
Audio-Visual Variational Fusion for Multi-Person Tracking with Robots Proceedings Article
In: ACM Multimedia, Nice, France, 2019.
FAT/MM'19: 1st International Workshop on Fairness, Accountability, and Transparency in MultiMedia Proceedings Article
In: ACM International Conference on Multimedia, Nice, France, 2019.
Tracking Multiple Audio Sources with the Von Mises Distribution and Variational EM Journal Article
In: IEEE Signal Processing Letters, vol. 26, no. 6, pp. 798–802, 2019.
Predicting Media Memorability Task at MediaEval 2019 Proceedings Article
In: MediaEval 2019 Workshop, 2019.
A Comprehensive Analysis of Deep Regression Journal Article
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environment Journal Article
In: IEEE Journal of Selected Topics in Signal Processing, no. 1, pp. 88–103, 2019.
Increasing Image Memorability with Neural Style Transfer Journal Article
In: ACM Transactions on Multimedia Computing Communications and Applications, 2019.
2018
ACM MM'18 Workshop on Understanding Subjective Attributes of Data, Multimodal Recognition of Evoked Emotions Proceedings Article
In: ACM International Conference on Multimedia, Seoul, Korea, 2018.
Multimodal behavior analysis in the wild: an introduction Book Section
In: Alameda-Pineda, Xavier; Ricci, Elisa; Sebe, Nicu (Ed.): Multimodal behavior analysis in the wild, pp. 1-10, Elsevier, 2018.
Multimodal Behavior Analysis in the Wild: Advances and Challenges Book
Elsevier, 2018.
Accounting for Room Acoustics in Audio-Visual Multi-Speaker Tracking Proceedings Article
In: IEEE International Conference on Audio, Speech and Signal Processing, 2018.
DeepGUM: Learning Deep Robust Regression with a Gaussian-Uniform Mixture Model Proceedings Article
In: European Conference on Computer Vision, Munich, Germany, 2018.
A cascaded multiple-speaker localization and tracking system Proceedings Article
In: International Workshop on Acoustic Signal Enhancement (IWAENC), LOCATA Satellite Workshop, Tokyo, Japan, 2018.
Every Smile is Unique: Landmark-Guided Diverse Smile Generation Proceedings Article
In: IEEE International Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018.
Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval Journal Article
In: IEEE Transactions on Image Processing, 2018.
2017
Multimodal analysis of free-standing conversational groups Book Section
In: Chang, Shih-Fu (Ed.): Frontiers of Multimedia Research, pp. 51-74, Morgan and Claypool, 2017.
Viraliency: Pooling local virality Proceedings Article
In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6080–6088, 2017.
MUSA2 – First ACM Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes Proceedings Article
In: ACM Multimedia, Mountain View, USA, 2017.
Exploiting the Complementarity of Audio-Visual Data for Probabilistic Multi-Speaker Tracking Proceedings Article
In: IEEE ICCV Workshop on Computer Vision for Audio-Visual Media, Venice, Italy, 2017.
Tracking a Varying Number of People with a Visually-Controlled Robotic Head Proceedings Article
In: Intelligent Robots and Systems, Vancouver,Canada, 2017.
Automatic Animation of an Articulatory Tongue Model from Ultrasound Images of the Vocal Tract Journal Article
In: Speech Communications, vol. 93, pp. 63–75, 2017.
Adaptation of a Gaussian Mixture Regressor to a New Input Distribution: Extending the C-GMR Framework Proceedings Article
In: International Conference on Latent Variable Analysis and Signal Separation, Grenoble, France, 2017.
Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping Journal Article
In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017.
Exploting the Intermittency of Speech for Joint Separation and Diarization Proceedings Article
In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, USA, 2017.
Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions Patent
US 15631346, 2017.
How to Make an Image More Memorable? A Deep Style Transfer Approach Proceedings Article
In: ACM International Conference on Multimedia Retrieval, Bucharest, Romania, 2017.
Learning Deep Structured Multi-Scale Features using Attention-Gated CRFs for Contour Prediction Proceedings Article
In: Advances in Neural Information Processing Systems, Long Beach, USA, 2017.
An EM algorithm for joint source separation and diarisation of multichannel convolutive mixtures Proceedings Article
In: IEEE International Conference on Audio, Speech and Signal Processing, New Orleans, USA, 2017.
2016
SALSA: A multimodal dataset for the automated analysis of free-standing social interactions Book Section
In: Murino, Vittorio; Cristani, Marco; Shah, Shishir; Savarese, Silvio (Ed.): Group and Crowd Behavior for Computer Vision, Elsevier, 2016.
Recognizing Emotions from Abstract Paintings using Non-Linear Matrix Completion Proceedings Article
In: IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016.
SALSA: A Novel Dataset for Multimodal Group Behavior Analysis Journal Article
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 8, pp. 1707-1720, 2016.
An On-line Variational Bayesian Model for Multi-Person Tracking from Cluttered Scenes Journal Article
In: Computer Vision and Image Understanding, vol. 153, pp. 64-76, 2016.
Tracking Multiple Persons Based on a Variational Bayesian Model Proceedings Article
In: European Conference on Computer Vision Workshops, pp. 52–67, Amsterdam, 2016.
EM algorithms for weighted-data clustering with application to audio-visual scene analysis Journal Article
In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 12, pp. 2402-2415, 2016.
An inverse-gama source variance prior with factorized parametrization for audio source separation Proceedings Article
In: IEEE International Conference on Audio, Speech and Signal Processing, pp. 136-140, Shangai, China, 2016.
A Variational EM Algorithm for the Separation of Time-Varying Convolutive Audio Mixtures Journal Article
In: IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 24, no. 8, pp. 1408-1423, 2016.
Self-Adaptive Matrix Completion for Heart Rate Estimation from Face Videos under Realistic Conditions Proceedings Article
In: IEEE International Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016.
Projective Unsupervised Flexible Embedding with Optimal Graph Proceedings Article
In: British Machine Vision Conference, York, United Kingdom, 2016.
Academic Coupled Dictionary Learning for Sketch-based Image Retrieval Proceedings Article
In: ACM International Conference on Multimedia, Amsterdam, The Netherlands, 2016.
Multi-Paced Dictionary Learning for Cross-Domain Retrieval and Recognition Proceedings Article
In: IEEE International Conference on Pattern Recognition, Cancun, Mexico, 2016.