Shuai Zheng
Shuai Zheng
Amazon Web Services
Verified email at - Homepage
Cited by
Cited by
GluonCV and GluonNLP: Deep Learning in Computer Vision and Natural Language Processing.
J Guo, H He, T He, L Lausen, M Li, H Lin, X Shi, C Wang, J Xie, S Zha, ...
J. Mach. Learn. Res. 21 (23), 1-7, 2020
Communication-efficient distributed blockwise momentum SGD with error-feedback
S Zheng, Z Huang, J Kwok
Advances in Neural Information Processing Systems 32, 2019
Fast-and-Light Stochastic ADMM.
S Zheng, JT Kwok
IJCAI, 2407-2613, 2016
Asynchronous Distributed Semi-Stochastic Gradient Optimization
R Zhang, S Zheng, JT Kwok
AAAI, 2323-2329, 2016
Cser: Communication-efficient sgd with error reset
C Xie, S Zheng, S Koyejo, I Gupta, M Li, H Lin
Advances in Neural Information Processing Systems 33, 12593-12603, 2020
Accelerated large batch optimization of bert pretraining in 54 minutes
S Zheng, H Lin, S Zha, M Li
arXiv preprint arXiv:2006.13484, 2020
Follow the moving leader in deep learning
S Zheng, JT Kwok
International Conference on Machine Learning, 4110-4119, 2017
Stochastic variance-reduced admm
S Zheng, JT Kwok
arXiv preprint arXiv:1604.07070, 2016
Lightweight Stochastic Optimization for Minimizing Finite Sums with Infinite Data
S Zheng, JT Kwok
International Conference on Machine Learning, 5932-5940, 2018
Compressed communication for distributed training: Adaptive methods and system
Y Zhong, C Xie, S Zheng, H Lin
arXiv preprint arXiv:2105.07829, 2021
Partial and asymmetric contrastive learning for out-of-distribution detection in long-tailed recognition
H Wang, A Zhang, Y Zhu, S Zheng, M Li, AJ Smola, Z Wang
International Conference on Machine Learning, 23446-23458, 2022
Alexa teacher model: Pretraining and distilling multi-billion-parameter encoders for natural language understanding systems
J FitzGerald, S Ananthakrishnan, K Arkoudas, D Bernardi, A Bhagia, ...
Removing batch normalization boosts adversarial training
H Wang, A Zhang, S Zheng, X Shi, M Li, Z Wang
International Conference on Machine Learning, 23433-23445, 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Z Zhang, S Zheng, Y Wang, J Chiu, G Karypis, T Chilimbi, M Li, X Jin
arXiv preprint arXiv:2205.00119, 2022
Context, Language Modeling, and Multimodal Data in Finance
S Das, C Goggins, J He, G Karypis, S Krishnamurthy, M Mahajan, ...
The Journal of Financial Data Science 3 (3), 52-66, 2021
Blockwise Adaptivity: Faster Training and Better Generalization in Deep Learning
S Zheng, JT Kwok
arXiv preprint arXiv:1905.09899, 2019
Stochastic Optimization for Machine Learning
S Zheng, 2017
Fast nonsmooth regularized risk minimization with continuation
RZ ShuaiZheng, JT Kwok
AAAI, 2393-2399, 2016
DCAF-BERT: A Distilled Cachable Adaptable Factorized Model For Improved Ads CTR Prediction
A Muhamed, J Singh, S Zheng, I Keivanloo, S Perera, J Mracek, Y Xu, ...
Contractive error feedback for gradient compression
B Li, S Zheng, P Raman, A Shrivastava, GB Giannakis
The system can't perform the operation now. Try again later.
Articles 1–20