Speech Recognition and Synthesis
Speech recognition and synthesis research concerns how machines learn to convert spoken language into text and, conversely, generate intelligible audio from written input — a problem that sits at the intersection of signal processing, linguistics, and machine learning. Progress accelerated sharply when deep neural networks replaced earlier statistical approaches like hidden Markov models, enabling systems to learn acoustic patterns directly from large volumes of raw audio data rather than relying on hand-crafted features. Today, end-to-end architectures and sequence-to-sequence models have made it possible to train a single network for the entire recognition pipeline, but researchers are still working to close the gap between laboratory accuracy and real-world performance across diverse speakers, languages, accents, and noisy environments. Active directions include robust speaker verification — confirming identity from voice alone — and speaker diarization, the problem of determining who spoke when in a multi-speaker recording, both of which remain genuinely difficult when audio conditions are unpredictable.
- Works
- 97,460
- Total citations
- 997,703
- Keywords
- Deep Neural NetworksAcoustic ModelingSpeaker VerificationConvolutional Neural NetworksEnd-to-End Speech RecognitionHidden Markov Models
Top papers in Speech Recognition and Synthesis
Ordered by total citation count.
- AI-Assisted Pipeline for Dynamic Generation of Trustworthy Health Supplement Content at Scale↗ 45,676OA
- Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation↗ 24,415OA
- A tutorial on hidden Markov models and selected applications in speech recognition↗ 22,780
- Efficient Estimation of Word Representations in Vector Space↗ 18,145OA
- Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data↗ 13,011OA
- Efficient Estimation of Word Representations in Vector Space↗ 11,737OA
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling↗ 10,801OA
- Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups↗ 10,309
- Bidirectional recurrent neural networks↗ 10,014
- Speech recognition with deep recurrent neural networks↗ 8,834
- Fundamentals of speech recognition↗ 7,704
- LSTM: A Search Space Odyssey↗ 6,821OA
Active researchers
Top authors in this area, ranked by h-index.