Speech Recognition and Synthesis
Speech recognition and synthesis research focuses on enabling machines to accurately convert spoken language into text and, conversely, to generate natural-sounding speech from written input. The field has shifted substantially over the past decade from systems built around Hidden Markov Models and hand-engineered acoustic features toward end-to-end deep neural network architectures—including convolutional and sequence-to-sequence models—that learn representations directly from raw audio, dramatically improving accuracy across languages and acoustic conditions. Reliably handling noisy environments, accented speech, and low-resource languages without massive labeled datasets remains an active challenge, as does speaker verification: determining not just what was said, but who said it, with sufficient robustness for real-world authentication. Researchers are also pushing toward systems that integrate acoustic modeling, language modeling, and speaker diarization into unified frameworks, reducing the cascading errors that arise when these components are trained and deployed in isolation.
- Works
- 96,247
- Total citations
- 992,622
- Keywords
- Deep Neural NetworksAcoustic ModelingSpeaker VerificationConvolutional Neural NetworksEnd-to-End Speech RecognitionHidden Markov Models
Top papers in Speech Recognition and Synthesis
Ordered by total citation count.
- AI-Assisted Pipeline for Dynamic Generation of Trustworthy Health Supplement Content at Scale↗ 45,524OA
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation↗ 24,143OA
- A tutorial on hidden Markov models and selected applications in speech recognition↗ 22,702
- Efficient Estimation of Word Representations in Vector Space↗ 18,109OA
- Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data↗ 12,995OA
- Efficient Estimation of Word Representations in Vector Space↗ 11,736OA
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling↗ 10,770OA
- Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups↗ 10,258
- Bidirectional recurrent neural networks↗ 9,882
- Speech recognition with deep recurrent neural networks↗ 8,795
- Fundamentals of speech recognition↗ 7,699
- LSTM: A Search Space Odyssey↗ 6,725OA
Active researchers
Top authors in this area, ranked by h-index.