Audio-Linguistic Embeddings for Spoken Sentences. (arXiv:1902.07817v1 [cs.SD]) ~ Auto News

Friday, 22 February 2019

Audio-Linguistic Embeddings for Spoken Sentences. (arXiv:1902.07817v1 [cs.SD])

February 22, 2019 / by Huzaifa / in cs updates on arXiv.org / with No comments /

We propose spoken sentence embeddings which capture both acoustic and linguistic content. While existing works operate at the character, phoneme, or word level, our method learns long-term dependencies by modeling speech at the sentence level. Formulated as an audio-linguistic multitask learning problem, our encoder-decoder model simultaneously reconstructs acoustic and natural language features from audio. Our results show that spoken sentence embeddings outperform phoneme and word-level baselines on speech recognition and emotion recognition tasks. Ablation studies show that our embeddings can better model high-level acoustic concepts while retaining linguistic content. Overall, our work illustrates the viability of generic, multi-modal sentence embeddings for spoken language understanding.

from cs updates on arXiv.org https://ift.tt/2VczYuq
//

Auto News

Friday, 22 February 2019