Jonathan N. Lee

I am a Research Scientist at Google DeepMind working on Gemini Thinking models.

I received my PhD at Stanford in Reinforcement Learning.


> Google DeepMind, Research Scientist
    ├── Gemini Thinking, reasoning
    └── Reinforcement learning

> Stanford Computer Science, PhD
    ├── Reinforcement Learning
    ├── Stanford AI Lab
    ├── advised by Emma Brunskill
    └── NSF Graduate Research Fellowship

> Google Research
    ├── Learning Theory Team
    └── advised by Chris Dann, Alekh Agarwal, and Tong Zhang

> Google Brain
    └── advised by George Tucker, Ofir Nachum, and Bo Dai

> UC Berkeley Electrical Engineering & Computer Science, BS
    ├── Robot Learning
    ├── Berkeley AI Research, AUTOLab
    └── Advised by Ken Goldberg

Selected Projects

Supervised Pretraining Can Learn In-Context Reinforcement Learning.
Jonathan Lee*, Annie Xie*, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill
Neural Information Processing Systems (NeurIPS), 2023. (Spotlight)
Foundation Models for Decision-Making

Dueling RL: Reinforcement Learning with Trajectory Preferences.
Aldo Pacchiano, Aadirupa Saha, Jonathan Lee
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.
Alignment / RLHF

All Publications

2023
Supervised Pretraining Can Learn In-Context Reinforcement Learning.
Jonathan Lee*, Annie Xie*, Aldo Pacchiano, Yash Chandak, Chelsea Finn, Ofir Nachum, Emma Brunskill
Neural Information Processing Systems (NeurIPS), 2023. (Spotlight)
Experiment Planning with Function Approximation.
Aldo Pacchiano, Jonathan Lee, Emma Brunskill
Neural Information Processing Systems (NeurIPS), 2023.
Learning in POMDPs is Sample-Efficient with Hindsight Observability.
Jonathan Lee, Alekh Agarwal, Christoph Dann, Tong Zhang
International Conference on Machine Learning (ICML), 2023.
Dueling RL: Reinforcement Learning with Trajectory Preferences.
Aldo Pacchiano, Aadirupa Saha, Jonathan Lee
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023.
Estimating Optimal Policy Value in General Linear Contextual Bandits.
Jonathan Lee, Weihao Kong, Aldo Pacchiano, Vidya Muthukumar, Emma Brunskill
arXiv, 2023.
2022
Oracle Inequalities for Model Selection in Offline Reinforcement Learning.
Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai, Emma Brunskill
Neural Information Processing Systems (NeurIPS), 2022.
Model Selection in Batch Policy Optimization.
Jonathan Lee, George Tucker, Ofir Nachum, Bo Dai
International Conference on Machine Learning (ICML), 2022.
2021
Design of Experiments for Stochastic Contextual Linear Bandits.
Andrea Zanette*, Kefan Dong*, Jonathan Lee*, Emma Brunskill
Neural Information Processing Systems (NeurIPS), 2021.
Near Optimal Policy Optimization via REPS.
Aldo Pacchiano, Jonathan Lee, Peter Bartlett, Ofir Nachum
Neural Information Processing Systems (NeurIPS), 2021.
Online Model Selection for Reinforcement Learning with Function Approximation.
Jonathan Lee, Aldo Pacchiano, Vidya Muthukumar, Weihao Kong, Emma Brunskill
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
Dynamic Regret Convergence Analysis and an Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning.
Jonathan Lee, Michael Laskey, Ajay Kumar Tanwani, Anil Aswani, Ken Goldberg.
International Journal of Robotics Research (IJRR), 2021. (Invited paper)
2020
Accelerated Message Passing for Entropy-Regularized MAP Inference.
Jonathan Lee, Aldo Pacchiano, Peter Bartlett, Michael I. Jordan.
International Conference on Machine Learning (ICML), 2020.
Convergence Rates of Smooth Message Passing with Rounding in Entropy-Regularized MAP Inference.
Jonathan Lee*, Aldo Pacchiano*, Michael I. Jordan.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.
Online Learning with Continuous Variations: Dynamic Regret and Reductions.
Ching-An Cheng*, Jonathan Lee*, Ken Goldberg, Byron Boots.
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020.
2019
On-Policy Robot Imitation Learning from a Converging Supervisor.
Ashwin Balakrishna*, Brijen Thananjeyan*, Jonathan Lee, Arsh Zahed, Felix Li, Joseph E. Gonzalez, Ken Goldberg.
Conference on Robot Learning (CoRL), 2019. (Oral)
A Dynamic Regret Analysis and Adaptive Regularization Algorithm for On-Policy Robot Imitation Learning.
Jonathan Lee, Michael Laskey, Ajay Kumar Tanwani, Anil Aswani, Ken Goldberg.
Springer Proceedings in Advanced Robotics: Algorithmic Foundations of Robotics, 2019.
International Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018. (Invited to IJRR)
Generalizing Robot Imitation Learning with Invariant Hidden Semi-Markov Models.
Ajay Kumar Tanwani, Jonathan Lee, Brijen Thananjeyan, Michael Laskey, Sanjay Krishnan, Roy Fox, Ken Goldberg, Sylvain Calinon
Springer Proceedings in Advanced Robotics: Algorithmic Foundations of Robotics, 2019.
International Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018. (Invited to IJRR)
2018
Constraint Estimation and Derivative-Free Recovery for Robot Learning from Demonstrations.
Jonathan Lee, Michael Laskey, Roy Fox, Ken Goldberg.
IEEE Conference on Automation Science and Engineering (CASE), 2018.
2017
DART: Noise Injection for Robust Imitation Learning.
Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, Ken Goldberg.
Conference on Robot Learning (CoRL), 2017.
[BAIR Blog]
Comparing Human-Centric and Robot-Centric Sample Efficiency for Robot Deep Learning from Demonstrations.
Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg.
IEEE Conference on Robotics and Automation (ICRA), 2017.
2016
Robot Grasping in Clutter: Using a Hierarchy of Supervisors for Learning from Demonstrations.
Michael Laskey, Jonathan Lee, Caleb Chuck, David Gealy, Wesley Hsieh, Florian T. Pokorny, Anca D. Dragan, and Ken Goldberg.
IEEE Conference on Automation Science and Engineering (CASE), 2016.

Short papers, workshop papers, etc.

Improved Estimator Selection for Off-Policy Evaluation
George Tucker, Jonathan Lee.
ICML Workshop on Reinforcement Learning Theory, 2021.
Continuous Online Learning and New Insights into Online Imitation Learning.
Jonathan Lee*, Ching-An Cheng*, Ken Golberg, Byron Boots.
NeurIPS Optimization Foundations for Reinforcement Learning Workshop, 2019. (Best Paper Award)
Stability Analysis of On-Policy Imitation Learning Algorithms Using Dynamic Regret.
Jonathan Lee, Michael Laskey, Ajay Kumar Tanwani, Ken Goldberg.
RSS Workshop on Imitation and Causality, 2018. (Spotlight)
Iterative Noise Injection for Scalable Imitation Learning.
Michael Laskey, Jonathan Lee, Wesley Hsieh, Richard Liaw, Jeffrey Mahler, Roy Fox, Ken Goldberg.
arXiv, 2017.