The Integrated-Cascaded Gaussian Mixture Regressor @ TASLP and LVA-ICA’2017



Laurent Girin, Thomas Hueber and Xavier Alameda-Pineda


We’ve got two papers accepted at TASLP and at LVA-ICA 2017 on the Integrated-Cascaded Gaussian Mixture Regression [1, 2].

Abstract: This article addresses the adaptation of an acoustic-articulatory inversion model of a reference speaker to the voice of another source speaker, using a limited amount of audio-only data. In this study, the articulatory-acoustic relationship of the reference speaker is modeled by a Gaussian mixture model and inference of articulatory data from acoustic data is made by the associated Gaussian mixture regression (GMR). To address speaker adaptation, we previously proposed a general framework called Cascaded-GMR (C-GMR) which decomposes the adaptation process into two consecutive steps: spectral conversion between source and reference speaker and acoustic-articulatory inversion of converted spectral trajectories. In particular, we proposed the Integrated C-GMR technique (IC-GMR) in which both steps are tied together in the same probabilistic model. In this article, we extend the C-GMR framework with another model called Joint-GMR (J-GMR). Contrary to the IC-GMR, this model aims at exploiting all potential acoustic-articulatory relationships, including those between the source speaker’s acoustics and the
reference speaker’s articulation. We present the full derivation of the exact Expectation-Maximization (EM) training algorithm for the J-GMR. It exploits the missing data methodology of machine learning to deal with limited adaptation data. We provide an extensive evaluation of the J-GMR on both synthetic acoustic-articulatory data and on the multi-speaker MOCHA EMA database. We compare the J-GMR performance to other models of the C-GMR framework, notably the IC-GMR, and discuss their respective merits.

References:

  1. L. Girin, T. Hueber, and X. Alameda-Pineda, “Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017. [ bib pdf ]
    @article{Girin-TASLP-2017,
      author={L. Girin and T. Hueber and X. Alameda-Pineda},
      journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
      title={Extending the Cascaded Gaussian Mixture Regression Framework for Cross-Speaker Acoustic-Articulatory Mapping},
      year={2017},
      doi={10.1109/TASLP.2017.2651398},
      pdf={http://xavirema.eu/wp-content/papercite-data/pdf/Girin-TASLP-2017.pdf}
    }
  2. L. Girin, T. Hueber, and X. Alameda-Pineda, “Adaptation of a Gaussian Mixture Regressor to a New Input Distribution: Extending the C-GMR Framework,” in International Conference on Latent Variable Analysis and Signal Separation, Grenoble, France, 2017. [ bib pdf ]
    @inproceedings{Girin-LVA-2017,
      title={Adaptation of a {G}aussian Mixture Regressor to a New Input Distribution: Extending the {C-GMR} Framework},
      author={Laurent Girin and Thomas Hueber and Xavier Alameda-Pineda},
      year={2017},
      booktitle={International Conference on Latent Variable Analysis and Signal Separation},
      address={Grenoble, France},
      pdf={http://xavirema.eu/wp-content/papercite-data/pdf/Girin-LVA-2017.pdf}
    }

Category: Research

No responses yet.

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>