Gradient descent on two-layer nets: Margin maximization and simplicity bias K Lyu, Z Li, R Wang, S Arora Advances in Neural Information Processing Systems 34, 12978-12991, 2021 | 84 | 2021 |
Mildly overparametrized neural nets can memorize training data efficiently R Ge, R Wang, H Zhao arXiv preprint arXiv:1909.11837, 2019 | 21 | 2019 |
Optimal gradient-based algorithms for non-concave bandit optimization B Huang, K Huang, S Kakade, JD Lee, Q Lei, R Wang, J Yang Advances in Neural Information Processing Systems 34, 29101-29115, 2021 | 17 | 2021 |
Going beyond linear rl: Sample efficient neural function approximation B Huang, K Huang, S Kakade, JD Lee, Q Lei, R Wang, J Yang Advances in Neural Information Processing Systems 34, 8968-8983, 2021 | 10 | 2021 |
The marginal value of momentum for small learning rate sgd R Wang, S Malladi, T Wang, K Lyu, Z Li arXiv preprint arXiv:2307.15196, 2023 | 7 | 2023 |