Dylan Hadfield-Menell

Cited by

	All	Since 2019
Citations	3333	3028
h-index	25	25
i10-index	41	40

820

410

205

615

20152016201720182019202020212022202320249 19 78 167 197 333 408 442 815 819

Public access

View all

16 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Anca D DraganAssistant Professor at UC Berkeley // Director, AI Safety and Alignment, Google DeepMindVerified email at berkeley.edu
Stuart RussellProfessor of Computer Science, University of California, BerkeleyVerified email at cs.berkeley.edu
Pieter AbbeelUC Berkeley | CovariantVerified email at cs.berkeley.edu
Stephen CasperPhD student, MITVerified email at mit.edu
Gillian HadfieldProfessor of Law and Professor of Strategic Management, University of Toronto; Faculty AffiliateVerified email at utoronto.ca
Smitha MilliCornell TechVerified email at berkeley.edu
Thomas L. GriffithsProfessor of Psychology and Computer Science, Princeton UniversityVerified email at princeton.edu
Rohan ChitnisMeta AI, MIT, UC BerkeleyVerified email at fb.com
Andreas HauptMassachusetts Institute of TechnologyVerified email at mit.edu
Jaime Fernández FisacAssistant Professor of Electrical and Computer Engineering, Princeton UniversityVerified email at princeton.edu
Marc KhouryUniversity of California, BerkeleyVerified email at eecs.berkeley.edu
McKane AndrusUW HCDEVerified email at uw.edu
Sandy H HuangResearch Scientist, DeepMindVerified email at berkeley.edu
Siddharth SrivastavaArizona State UniversityVerified email at asu.edu
Simon ZhuangVerified email at berkeley.edu
Robert D. HawkinsUniversity of Wisconsin-MadisonVerified email at wisc.edu
Mark HoAssistant Professor, New York UniversityVerified email at nyu.edu
Gokul SwamyPhD Candidate, Carnegie Mellon UniversityVerified email at andrew.cmu.edu
Micah CarrollPhD student, UC BerkeleyVerified email at berkeley.edu
Gabriel KreimanProfessor, Harvard Medical School and Children's HospitalVerified email at tch.harvard.edu

Dylan Hadfield-Menell

Massachusetts Institute of Technology

Verified email at csail.mit.edu - Homepage

Artificial Intelligence


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Cooperative Inverse Reinforcement Learning D Hadfield-Menell, SJ Russell, P Abbeel, A Dragan Advances in Neural Information Processing Systems 29, 2016	759	2016
Inverse Reward Design D Hadfield-Menell, S Milli, P Abbeel, SJ Russell, A Dragan Advances in Neural Information Processing Systems 30, 2017	442	2017
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... Transactions on Machine Learning Research, 2023	259	2023
The off-switch game D Hadfield-Menell, A Dragan, P Abbeel, S Russell Proceedings of the Twenty-Sixth International Joint Conference on Artificial …, 2017	165	2017
Toward Transparent AI: A survey on interpreting the inner structures of deep neural networks T Räuker, A Ho, S Casper, D Hadfield-Menell 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 464-483, 2023	127	2023
On the geometry of adversarial examples M Khoury, D Hadfield-Menell arXiv preprint arXiv:1811.00525, 2018	105*	2018
Pragmatic-pedagogic value alignment JF Fisac, MA Gates, JB Hamrick, C Liu, D Hadfield-Menell, ... Robotics research: the 18th international symposium Isrr, 49-57, 2020	98	2020
Guided search for task and motion plans using learned heuristics R Chitnis, D Hadfield-Menell, A Gupta, S Srivastava, E Groshev, C Lin, ... 2016 IEEE International Conference on Robotics and Automation (ICRA), 447-454, 2016	84	2016
Incomplete contracting and AI alignment D Hadfield-Menell, GK Hadfield Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 417-422, 2019	81	2019
Should robots be obedient? S Milli, D Hadfield-Menell, A Dragan, S Russell Proceedings of the 26th International Joint Conference on Artificial …, 2017	76	2017
What are you optimizing for? aligning recommender systems with human values J Stray, I Vendrov, J Nixon, S Adler, D Hadfield-Menell arXiv preprint arXiv:2107.10939, 2021	71	2021
Consequences of Misaligned AI S Zhuang, D Hadfield-Menell Advances in Neural Information Processing Systems 33, 15763-15773, 2020	66	2020
Conservative Agency via Attainable Utility Preservation AM Turner, D Hadfield-Menell, P Tadepalli Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 385-391, 2020	65	2020
On the utility of model learning in hri R Choudhury, G Swamy, D Hadfield-Menell, AD Dragan 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019	61	2019
Expressive robot motion timing A Zhou, D Hadfield-Menell, A Nagabandi, AD Dragan Proceedings of the 2017 ACM/IEEE international conference on human-robot …, 2017	59	2017
Modular task and motion planning in belief space D Hadfield-Menell, E Groshev, R Chitnis, P Abbeel 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2015	56	2015
Explore, establish, exploit: Red teaming language models from scratch S Casper, J Lin, J Kwon, G Culp, D Hadfield-Menell arXiv preprint arXiv:2306.09442, 2023	54	2023
Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents R Köster, D Hadfield-Menell, R Everett, L Weidinger, GK Hadfield, ... Proceedings of the National Academy of Sciences 119 (3), e2106028118, 2022	49*	2022
The assistive multi-armed bandit L Chan, D Hadfield-Menell, S Srinivasa, A Dragan 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019	48	2019
Building Human Values into Recommender Systems: An Interdisciplinary Synthesis and Open Problems J Stray, A Halevy, P Assar, D Hadfield-menell, C Boutilier, A Ashar, ... ACM Transactions on Recommender Systems, 2023	43*	2023

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors