Rongjie Huang

Cited by

	All	Since 2019
Citations	1010	1008
h-index	15	15
i10-index	20	20

600

300

150

450

20212022202320245 100 594 305

Public access

View all

7 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Zhou ZhaoZhejiang UniversityVerified email at zju.edu.cn
Yi Ren (任意)Research Scientist, TiktokVerified email at bytedance.com
Jinglin Liu (刘静林)Research Scientist, ByteDanceVerified email at bytedance.com
Zhenhui Ye (叶振辉)Zhejiang universityVerified email at zju.edu.cn
Dongchao YangThe Chinese University of HongKongVerified email at se.cuhk.edu.hk
Dong Yu (俞栋)Distinguished Scientist @ Tencent AI Lab, ACM/IEEE/ISCA FellowVerified email at global.tencent.com
Ziyue JiangZhejiang UniversityVerified email at zju.edu.cn
Xize Cheng（成曦泽）Zhejiang UniversityVerified email at zju.edu.cn
Huadai LiuZhejiang UniversityVerified email at zju.edu.cn
Jiatong Shi (史嘉彤)Carnegie Mellon UniversityVerified email at andrew.cmu.edu
Xuankai ChangCarnegie Mellon University, StudentVerified email at andrew.cmu.edu
Songxiang LiumiHoYoVerified email at mihoyo.com
Shinji WatanabeCarnegie Mellon UniversityVerified email at cmu.edu
Chunlei ZhangTencent AI Lab, Bellevue.Verified email at global.tencent.com
Max W. Y. LamIndependent Researcher

Rongjie Huang

Zhejiang University

Verified email at zju.edu.cn - Homepage

Speech Multimedia Computing Natural Language Processing


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models R Huang, J Huang, D Yang, Y Ren, L Liu, M Li, Z Ye, J Liu, X Yin, Z Zhao ICML 2023, 2023	122	2023
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis R Huang, MWY Lam, J Wang, D Su, D Yu, Y Ren, Z Zhao IJCAI 2022, 2022	116	2022
Bilateral denoising diffusion models MWY Lam, J Wang, R Huang, D Su, D Yu arXiv preprint arXiv:2108.11514, 2021	107*	2021
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech R Huang, Z Zhao, H Liu, J Liu, C Cui, Y Ren ACM MM 2022, 2022	102	2022
Audiogpt: Understanding and generating speech, music, sound, and talking head R Huang, M Li, D Yang, J Shi, X Chang, Z Ye, Y Wu, Z Hong, J Huang, ... Proceedings of the AAAI Conference on Artificial Intelligence 38 (21), 23802 …, 2024	87	2024
Multi-singer: Fast multi-singer singing voice vocoder with a large-scale corpus R Huang, F Chen, Y Ren, J Liu, C Cui, Z Zhao ACM MM 2021, 3945-3954, 2021	68	2021
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech R Huang, Y Ren, J Liu, C Cui, Z Zhao NeurIPS 2022, 2022	59	2022
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation R Huang, C Cui, F Chen, Y Ren, J Liu, Z Zhao, B Huai, Z Wang ACM MM 2022, 2022	47	2022
M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus L Zhang, R Li, S Wang, L Deng, J Liu, Y Ren, J He, R Huang, J Zhu, ... NeurIPS 2022, 2022	37	2022
Instructtts: Modelling expressive TTS in discrete latent space with natural language style prompt D Yang, S Liu, R Huang, C Weng, H Meng arXiv preprint arXiv:2301.13662, 2023	35	2023
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation R Huang, Z Zhao, J Liu, H Liu, Y Ren, L Zhang, J He ICLR 2023, 2022	28	2022
Hifi-codec: Group-residual vector quantization for high fidelity audio codec D Yang, S Liu, R Huang, J Tian, C Weng, Y Zou arXiv preprint arXiv:2305.02765, 2023	26	2023
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model C Cui, Y Ren, J Liu, F Chen, R Huang, M Lei, Z Zhao Interspeech, 2021, 2021	22	2021
Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias Z Jiang, Y Ren, Z Ye, J Liu, C Zhang, Q Yang, S Ji, R Huang, C Wang, ... arXiv preprint arXiv:2306.03509, 2023	20	2023
Uniaudio: An audio foundation model toward universal audio generation D Yang, J Tian, X Tan, R Huang, S Liu, X Chang, J Shi, S Zhao, J Bian, ... arXiv preprint arXiv:2310.00704, 2023	17	2023
Make-a-voice: Unified voice synthesis with discrete representation R Huang, C Zhang, Y Wang, D Yang, L Liu, Z Ye, Z Jiang, C Weng, ... arXiv preprint arXiv:2305.19269, 2023	14	2023
Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition X Cheng, T Jin, R Huang, L Li, W Lin, Z Wang, Y Wang, H Liu, A Yin, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023	12	2023
Make-an-audio 2: Temporal-enhanced text-to-audio generation J Huang, Y Ren, R Huang, D Yang, Z Ye, C Zhang, J Liu, X Yin, Z Ma, ... arXiv preprint arXiv:2305.18474, 2023	11	2023
Geneface++: Generalized and stable real-time audio-driven 3d talking face generation Z Ye, J He, Z Jiang, R Huang, J Huang, J Liu, Y Ren, X Yin, Z Ma, Z Zhao arXiv preprint arXiv:2305.00787, 2023	11	2023
Clapspeech: Learning prosody from text context with contrastive language-audio pre-training Z Ye, R Huang, Y Ren, Z Jiang, J Liu, J He, X Yin, Z Zhao arXiv preprint arXiv:2305.10763, 2023	10	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors