NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality X Tan, J Chen, H Liu, J Cong, C Zhang, Y Liu, X Wang, Y Leng, Y Yi, L He, ... IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (6), 4234-4245, 2024 | 252 | 2024 |
Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin, S Zhao, J Bian arXiv preprint arXiv:2304.09116, 2023 | 250 | 2023 |
Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models Z Ju, Y Wang, K Shen, X Tan, D Xin, D Yang, Y Liu, Y Leng, K Song, ... arXiv preprint arXiv:2403.03100, 2024 | 162 | 2024 |
Qwen2-audio technical report Y Chu, J Xu, Q Yang, H Wei, X Wei, Z Guo, Y Leng, Y Lv, J He, J Lin, ... arXiv preprint arXiv:2407.10759, 2024 | 121 | 2024 |
MBNET: MOS Prediction for Synthesized Speech with Mean-Bias Network Y Leng, X Tan, S Zhao, F Soong, XY Li, T Qin ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 114 | 2021 |
Prompttts: Controllable text-to-speech with text descriptions Z Guo, Y Leng, Y Wu, S Zhao, X Tan ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 108 | 2023 |
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition Y Leng, X Tan, L Zhu, J Xu, R Luo, L Liu, T Qin, XY Li, E Lin, TY Liu Advances in Neural Information Processing Systems 34, 2021 | 99 | 2021 |
Binauralgrad: A two-stage conditional diffusion probabilistic model for binaural audio synthesis Y Leng, Z Chen, J Guo, H Liu, J Chen, X Tan, D Mandic, L He, X Li, T Qin, ... Advances in Neural Information Processing Systems 35, 23689-23700, 2022 | 59 | 2022 |
Air-bench: Benchmarking large audio-language models via generative comprehension Q Yang, J Xu, W Liu, Y Chu, Z Jiang, X Zhou, Y Leng, Y Lv, Z Zhao, ... arXiv preprint arXiv:2402.07729, 2024 | 49 | 2024 |
Prompttts 2: Describing and generating voices with text prompt Y Leng, Z Guo, K Shen, X Tan, Z Ju, Y Liu, Y Liu, D Yang, L Zhang, ... arXiv preprint arXiv:2309.02285, 2023 | 42 | 2023 |
Unsupervised pivot translation for distant languages Y Leng, X Tan, T Qin, XY Li, TY Liu ACL 2019, 2019 | 33 | 2019 |
Analyzing and mitigating interference in neural architecture search J Xu, X Tan, K Song, R Luo, Y Leng, T Qin, TY Liu, J Li International Conference on Machine Learning, 24646-24662, 2022 | 32 | 2022 |
Microsoft Research Asia's systems for WMT19 Y Xia, X Tan, F Tian, F Gao, W Chen, Y Fan, L Gong, Y Leng, R Luo, ... arXiv preprint arXiv:1911.06191, 2019 | 28 | 2019 |
Resgrad: Residual denoising diffusion probabilistic models for text to speech Z Chen, Y Wu, Y Leng, J Chen, H Liu, X Tan, Y Cui, K Wang, L He, S Zhao, ... arXiv preprint arXiv:2212.14518, 2022 | 23 | 2022 |
Softcorrect: Error correction with soft detection for automatic speech recognition Y Leng, X Tan, W Liu, K Song, R Wang, XY Li, T Qin, E Lin, TY Liu proceedings of the AAAI conference on artificial intelligence 37 (11), 13034 …, 2023 | 22 | 2023 |
Speech-t: Transducer for text to speech and beyond J Chen, X Tan, Y Leng, J Xu, G Wen, T Qin, TY Liu Advances in Neural Information Processing Systems 34, 6621-6633, 2021 | 21 | 2021 |
Mask the correct tokens: An embarrassingly simple approach for error correction K Shen, Y Leng, X Tan, S Tang, Y Zhang, W Liu, E Lin arXiv preprint arXiv:2211.13252, 2022 | 15 | 2022 |
Transcormer: Transformer for sentence scoring with sliding language modeling K Song, Y Leng, X Tan, Y Zou, T Qin, D Li Advances in Neural Information Processing Systems 35, 11160-11174, 2022 | 13 | 2022 |
A study of multilingual neural machine translation X Tan, Y Leng, J Chen, Y Ren, T Qin, TY Liu arXiv preprint arXiv:1912.11625, 2019 | 11 | 2019 |
Extract and attend: Improving entity translation in neural machine translation Z Zeng, R Wang, Y Leng, J Guo, X Tan, T Qin, T Liu arXiv preprint arXiv:2306.02242, 2023 | 7 | 2023 |