emotion2vec: Self-supervised pre-training for speech emotion representation Z Ma, Z Zheng, J Ye, J Li, Z Gao, S Zhang, X Chen arXiv preprint arXiv:2312.15185, 2023 | 47 | 2023 |
MT4SSL: Boosting self-supervised speech representation learning by integrating multiple targets Z Ma, Z Zheng, C Tang, Y Wang, X Chen arXiv preprint arXiv:2211.07321, 2022 | 21 | 2022 |
EAT: Self-supervised pre-training with efficient audio transformer W Chen, Y Liang, Z Ma, Z Zheng, X Chen arXiv preprint arXiv:2401.03497, 2024 | 15 | 2024 |
Leveraging speech ptm, text llm, and emotional tts for speech emotion recognition Z Ma, W Wu, Z Zheng, Y Guo, Q Chen, S Zhang, X Chen ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 12 | 2024 |
BAT: Learning to Reason about Spatial Sounds with Large Language Models Z Zheng, P Peng, Z Ma, X Chen, E Choi, D Harwath ICML 2024, 2024 | 9 | 2024 |
Pushing the limits of unsupervised unit discovery for SSL speech representation Z Ma, Z Zheng, G Yang, Y Wang, C Zhang, X Chen arXiv preprint arXiv:2306.08920, 2023 | 9 | 2023 |
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark Z Ma, M Chen, H Zhang, Z Zheng, W Chen, X Li, J Ye, X Chen, T Hain arXiv preprint arXiv:2406.07162, 2024 | 8 | 2024 |
Fast-Hubert: an Efficient Training Framework for Self-Supervised Speech Representation Learning G Yang, Z Ma, Z Zheng, Y Song, Z Niu, X Chen 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-7, 2023 | 7 | 2023 |
Front-end adapter: Adapting front-end input of speech based self-supervised learning for speech recognition X Chen, Z Ma, C Tang, Y Wang, Z Zheng ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 5 | 2023 |
Unsupervised Active Learning: Optimizing Labeling Cost-Effectiveness for Automatic Speech Recognition Z Zheng, Z Ma, Y Wang, X Chen INTERSPEECH 2023, 2023 | 4 | 2023 |
Sjtu-thu automated audio captioning system for dcase 2024 W Chen, X Li, Z Ma, Y Liang, A Jiang, Z Zheng, Y Qian, P Fan, WQ Zhang, ... DCASE Challenge, Tech. Rep, Tech. Rep, 2024 | 3 | 2024 |
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs W Chen, Z Ma, X Li, X Xu, Y Liang, Z Zheng, K Yu, X Chen arXiv preprint arXiv:2410.09503, 2024 | 1 | 2024 |
Exploring effective distillation of self-supervised speech models for automatic speech recognition Y Wang, C Tang, Z Ma, Z Zheng, X Chen, WQ Zhang 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-6, 2023 | 1 | 2023 |
DRCap: Decoding CLAP Latents with Retrieval-augmented Generation for Zero-shot Audio Captioning X Li, W Chen, Z Ma, X Xu, Y Liang, Z Zheng, Q Kong, X Chen arXiv preprint arXiv:2410.09472, 2024 | | 2024 |