Finite-sample analysis for sarsa with linear function approximation S Zou, T Xu, Y Liang Advances in neural information processing systems 32, 2019 | 207 | 2019 |
Crpo: A new approach for safe reinforcement learning with convergence guarantee T Xu, Y Liang, G Lan International Conference on Machine Learning, 11480-11491, 2021 | 168* | 2021 |
Improving sample complexity bounds for (natural) actor-critic algorithms T Xu, Z Wang, Y Liang Advances in Neural Information Processing Systems 33, 4358-4369, 2020 | 149* | 2020 |
Two time-scale off-policy TD learning: Non-asymptotic analysis over Markovian samples T Xu, S Zou, Y Liang Advances in neural information processing systems 32, 2019 | 89 | 2019 |
Reanalysis of variance reduced temporal difference learning T Xu, Z Wang, Y Zhou, Y Liang arXiv preprint arXiv:2001.01898, 2020 | 52 | 2020 |
Enhanced first and zeroth order variance reduced algorithms for min-max optimization T Xu, Z Wang, Y Liang, HV Poor | 49* | 2020 |
Algorithms for the estimation of transient surface heat flux during ultra-fast surface cooling ZF Zhou, TY Xu, B Chen International Journal of Heat and Mass Transfer 100, 1-10, 2016 | 46 | 2016 |
Non-asymptotic convergence of adam-type reinforcement learning algorithms under markovian sampling H Xiong, T Xu, Y Liang, W Zhang Proceedings of the AAAI Conference on Artificial Intelligence 35 (12), 10460 …, 2021 | 38 | 2021 |
Faster algorithm and sharper analysis for constrained Markov decision process T Li, Z Guan, S Zou, T Xu, Y Liang, G Lan Operations Research Letters 54, 107107, 2024 | 36 | 2024 |
Sample complexity bounds for two timescale value-based reinforcement learning algorithms T Xu, Y Liang International conference on artificial intelligence and statistics, 811-819, 2021 | 35 | 2021 |
Proximal gradient descent-ascent: Variable convergence under k {\L} geometry Z Chen, Y Zhou, T Xu, Y Liang arXiv preprint arXiv:2102.04653, 2021 | 35 | 2021 |
Doubly robust off-policy actor-critic: Convergence and optimality T Xu, Z Yang, Z Wang, Y Liang International Conference on Machine Learning, 11581-11591, 2021 | 33 | 2021 |
When will generative adversarial imitation learning algorithms attain global convergence Z Guan, T Xu, Y Liang International Conference on Artificial Intelligence and Statistics, 1117-1125, 2021 | 24 | 2021 |
Model-based offline meta-reinforcement learning with regularization S Lin, J Wan, T Xu, Y Liang, J Zhang arXiv preprint arXiv:2202.02929, 2022 | 23 | 2022 |
When Will Gradient Methods Converge to Max-margin Classifier under ReLU Models? T Xu, Y Zhou, K Ji, Y Liang arXiv preprint arXiv:1806.04339, 2018 | 23* | 2018 |
Provably efficient offline reinforcement learning with trajectory-wise reward T Xu, Y Wang, S Zou, Y Liang IEEE Transactions on Information Theory, 2024 | 16 | 2024 |
Deterministic policy gradient: Convergence analysis H Xiong, T Xu, L Zhao, Y Liang, W Zhang Uncertainty in Artificial Intelligence, 2159-2169, 2022 | 15 | 2022 |
PER-ETD: A polynomially efficient emphatic temporal difference learning method Z Guan, T Xu, Y Liang arXiv preprint arXiv:2110.06906, 2021 | 9 | 2021 |
A unifying framework of off-policy general value function evaluation T Xu, Z Yang, Z Wang, Y Liang Advances in Neural Information Processing Systems 35, 13570-13583, 2022 | 6* | 2022 |
The perfect blend: Redefining RLHF with mixture of judges T Xu, E Helenowski, KA Sankararaman, D Jin, K Peng, E Han, S Nie, ... arXiv preprint arXiv:2409.20370, 2024 | 4 | 2024 |