Follow
Xize Cheng(成曦泽)
Xize Cheng(成曦泽)
Verified email at zju.edu.cn - Homepage
Title
Cited by
Cited by
Year
Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition
X Cheng, T Jin, R Huang, L Li, W Lin, Z Wang, Y Wang, H Liu, A Yin, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
122023
Connecting multi-modal contrastive representations
Z Wang, Y Zhao, H Huang, J Liu, A Yin, L Tang, L Li, Y Wang, Z Zhang, ...
Advances in Neural Information Processing Systems 36, 22099-22114, 2023
72023
Opensr: Open-modality speech recognition via maintaining multi-modality alignment
X Cheng, T Jin, L Li, W Lin, X Duan, Z Zhao
arXiv preprint arXiv:2306.06410, 2023
62023
Av-transpeech: Audio-visual robust speech-to-speech translation
R Huang, H Liu, X Cheng, Y Ren, L Li, Z Ye, J He, L Zhang, J Liu, X Yin, ...
arXiv preprint arXiv:2305.15403, 2023
62023
Diffusion denoising process for perceptron bias in out-of-distribution detection
L Liu, Y Ren, X Cheng, R Huang, C Li, Z Zhao
arXiv preprint arXiv:2211.11255, 2022
62022
TAVT: Towards Transferable Audio-Visual Text Generation
W Lin, T Jin, W Pan, L Li, X Cheng, Y Wang, Z Zhao
Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023
52023
Distilling coarse-to-fine semantic matching knowledge for weakly supervised 3d visual grounding
Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
52023
3drp-net: 3d relative position-aware network for 3d visual grounding
Z Wang, H Huang, Y Zhao, L Li, X Cheng, Y Zhu, A Yin, Z Zhao
arXiv preprint arXiv:2307.13363, 2023
42023
Contrastive token-wise meta-learning for unseen performer visual temporal-aligned translation
L Li, T Jin, X Cheng, Y Wang, W Lin, R Huang, Z Zhao
Findings of the Association for Computational Linguistics: ACL 2023, 10993-11007, 2023
42023
Weakly-supervised spoken video grounding via semantic interaction learning
Y Wang, W Lin, S Zhang, T Jin, L Li, X Cheng, Z Zhao
Proceedings of the 61st Annual Meeting of the Association for Computational …, 2023
32023
Semantic-conditioned dual adaptation for cross-domain query-based visual segmentation
Y Wang, T Jin, W Lin, X Cheng, L Li, Z Zhao
Findings of the Association for Computational Linguistics: ACL 2023, 9797-9815, 2023
22023
Wav2sql: Direct generalizable speech-to-sql parsing
H Liu, R Huang, J He, G Sun, R Shen, X Cheng, Z Zhao
arXiv preprint arXiv:2305.12552, 2023
22023
Exploring group video captioning with efficient relational approximation
W Lin, T Jin, Y Wang, W Pan, L Li, X Cheng, Z Zhao
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023
22023
Rethinking missing modality learning from a decoding perspective
T Jin, X Cheng, L Li, W Lin, Y Wang, Z Zhao
Proceedings of the 31st ACM International Conference on Multimedia, 4431-4439, 2023
12023
TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation
X Cheng, R Huang, L Li, T Jin, Z Wang, A Yin, M Li, X Duan, Z Zhao
arXiv preprint arXiv:2312.15197, 2023
2023
Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers
H Huang, Z Wang, R Huang, L Liu, X Cheng, Y Zhao, T Jin, Z Zhao
arXiv preprint arXiv:2312.08168, 2023
2023
Out-of-distribution Detection with Diffusion-based Neighborhood
L Liu, Y Ren, X Cheng, Z Zhao
2022
NaturalSigner: Diffusion Models are Natural Sign Language Generator
A Yin, J Xun, X Cheng, T Jin, S Zhang, Z Zhao, S Tang, F Wu
Listen to Motion: Robustly Learning Correlated Audio-Visual Representations
Z Wang, X Cheng, L Tang, L Liu, Y Zhao, T Jin, C Cai, W HongFa, W Liu, ...
The system can't perform the operation now. Try again later.
Articles 1–19