Natural tts synthesis by conditioning wavenet on mel spectrogram predictions J Shen, R Pang, RJ Weiss, M Schuster, N Jaitly, Z Yang, Z Chen, Y Zhang, ... 2018 IEEE international conference on acoustics, speech and signal …, 2018 | 3341 | 2018 |
Tacotron: Towards end-to-end speech synthesis Y Wang, RJ Skerry-Ryan, D Stanton, Y Wu, RJ Weiss, N Jaitly, Z Yang, ... arXiv preprint arXiv:1703.10135, 2017 | 2523* | 2017 |
Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis Y Wang, D Stanton, Y Zhang, RJS Ryan, E Battenberg, J Shor, Y Xiao, ... International conference on machine learning, 5180-5189, 2018 | 990 | 2018 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 702 | 2024 |
Towards end-to-end prosody transfer for expressive speech synthesis with tacotron RJ Skerry-Ryan, E Battenberg, Y Xiao, Y Wang, D Stanton, J Shor, ... international conference on machine learning, 4693-4702, 2018 | 702 | 2018 |
Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning Y Zhang, RJ Weiss, H Zen, Y Wu, Z Chen, RJ Skerry-Ryan, Y Jia, ... arXiv preprint arXiv:1907.04448, 2019 | 193 | 2019 |
Predicting expressive speaking style from text in end-to-end speech synthesis D Stanton, Y Wang, RJ Skerry-Ryan 2018 IEEE Spoken Language Technology Workshop (SLT), 595-602, 2018 | 148 | 2018 |
Semi-supervised training for improving data efficiency in end-to-end speech synthesis YA Chung, Y Wang, WN Hsu, Y Zhang, RJ Skerry-Ryan ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 142 | 2019 |
Location-relative attention mechanisms for robust long-form speech synthesis E Battenberg, RJ Skerry-Ryan, S Mariooryad, D Stanton, D Kao, ... ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 132 | 2020 |
Wave-tacotron: Spectrogram-free end-to-end text-to-speech synthesis RJ Weiss, RJ Skerry-Ryan, E Battenberg, S Mariooryad, DP Kingma ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 124 | 2021 |
Uncovering latent style factors for expressive speech synthesis Y Wang, RJ Skerry-Ryan, Y Xiao, D Stanton, J Shor, E Battenberg, ... arXiv preprint arXiv:1711.00520, 2017 | 89 | 2017 |
Parallel Tacotron 2: A non-autoregressive neural TTS model with differentiable duration modeling I Elias, H Zen, J Shen, Y Zhang, Y Jia, RJ Skerry-Ryan, Y Wu arXiv preprint arXiv:2103.14574, 2021 | 73 | 2021 |
Synthesizing speech from text using neural networks Y Wu, J Shen, R Pang, RJ Weiss, M Schuster, N Jaitly, Z Yang, Z Chen, ... US Patent 10,971,170, 2021 | 64 | 2021 |
Semi-supervised generative modeling for controllable speech synthesis R Habib, S Mariooryad, M Shannon, E Battenberg, RJ Skerry-Ryan, ... arXiv preprint arXiv:1910.01709, 2019 | 61 | 2019 |
Effective use of variational embedding capacity in expressive end-to-end speech synthesis E Battenberg, S Mariooryad, D Stanton, RJ Skerry-Ryan, M Shannon, ... arXiv preprint arXiv:1906.03402, 2019 | 58 | 2019 |
Speaker generation D Stanton, M Shannon, S Mariooryad, RJ Skerry-Ryan, E Battenberg, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 34 | 2022 |
Organic indoor location discovery S Teller, J Battat, B Charrow, D Curtis, R Ryan, J Ledlie, J Hicks Computer Science and Artificial Intelligence Laboratory Technical Report 75, 16, 2008 | 29 | 2008 |
Variational embedding capacity in expressive end-to-end speech synthesis ED Battenberg, D Stanton, RJW Skerry-Ryan, S Mariooryad, DT Kao, ... US Patent 11,222,621, 2022 | 21 | 2022 |
Non-saturating GAN training as divergence minimization M Shannon, B Poole, S Mariooryad, T Bagby, E Battenberg, D Kao, ... arXiv preprint arXiv:2010.08029, 2020 | 21 | 2020 |
Identifying entities using search results TA Lasko, A Tomkins, M Angelo, MK Gray, R Ryan, NU Godbole, ... US Patent 8,856,099, 2014 | 21 | 2014 |