Speech Generation
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers, arXiv 2023
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality, arXiv 2022
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech, ICLR 2021
FastSpeech: Fast, Robust and Controllable Text to Speech, NeurIPS 2019
PriorGrad: Acoustic Model
PriorGrad: Vocoder