Physical SciencesComputer ScienceArtificial Intelligence

Speech Recognition and Synthesis

Speech recognition and synthesis research concerns how machines learn to convert spoken language into text and, conversely, generate intelligible audio from written input — a problem that sits at the intersection of signal processing, linguistics, and machine learning. Progress accelerated sharply when deep neural networks replaced earlier statistical approaches like hidden Markov models, enabling systems to learn acoustic patterns directly from large volumes of raw audio data rather than relying on hand-crafted features. Today, end-to-end architectures and sequence-to-sequence models have made it possible to train a single network for the entire recognition pipeline, but researchers are still working to close the gap between laboratory accuracy and real-world performance across diverse speakers, languages, accents, and noisy environments. Active directions include robust speaker verification — confirming identity from voice alone — and speaker diarization, the problem of determining who spoke when in a multi-speaker recording, both of which remain genuinely difficult when audio conditions are unpredictable.

Works
97,460
Total citations
997,703
Keywords
Deep Neural NetworksAcoustic ModelingSpeaker VerificationConvolutional Neural NetworksEnd-to-End Speech RecognitionHidden Markov Models

Top papers in Speech Recognition and Synthesis

Ordered by total citation count.

Active researchers

Top authors in this area, ranked by h-index.

Related topics