Physical SciencesComputer ScienceArtificial Intelligence

Speech Recognition and Synthesis

Speech recognition and synthesis research focuses on enabling machines to accurately convert spoken language into text and, conversely, to generate natural-sounding speech from written input. The field has shifted substantially over the past decade from systems built around Hidden Markov Models and hand-engineered acoustic features toward end-to-end deep neural network architectures—including convolutional and sequence-to-sequence models—that learn representations directly from raw audio, dramatically improving accuracy across languages and acoustic conditions. Reliably handling noisy environments, accented speech, and low-resource languages without massive labeled datasets remains an active challenge, as does speaker verification: determining not just what was said, but who said it, with sufficient robustness for real-world authentication. Researchers are also pushing toward systems that integrate acoustic modeling, language modeling, and speaker diarization into unified frameworks, reducing the cascading errors that arise when these components are trained and deployed in isolation.

Works
96,247
Total citations
992,622
Keywords
Deep Neural NetworksAcoustic ModelingSpeaker VerificationConvolutional Neural NetworksEnd-to-End Speech RecognitionHidden Markov Models

Top papers in Speech Recognition and Synthesis

Ordered by total citation count.

Active researchers

Top authors in this area, ranked by h-index.

Related topics