Taylor W. Killian
Taylor Killian

Taylor W. Killian

Principal Scientist, Lila Sciences

Reinforcement Learning | Machine Learning | Decision Making Under Uncertainty

Download CV

Currently, I am a Principal Scientist at Lila Sciences within the AI Research organization. We are enthusiastically focused on developing Scientific Superintelligence, bringing advanced AI into automated laboratory settings to accelerate scientific discovery, there's a lot of exciting things on the horizon!

I work in the fields of reinforcement learning, machine learning, and causal inference. I have long been interested in decision making and the mechanisms by which humans summarize and reason about the world. In my work, I aim to develop models and algorithms that enable actors (whether human or not) to efficiently make decisions in the face of various forms of uncertainty. Ultimately, my goal is to develop algorithmic techniques that extend beyond the domain in which they are trained, adapting to their end uses and any unique aspects/preferences therein.

I'm always keen on hearing about interesting ideas and love collaborating with others on a variety of problems, applied and foundational, as far as there is alignment with my areas of focus. Don't hesitate to reach out!

News

  • 2 March 2026 I've joined Lila Sciences as a Principal Scientist in their AI Research organization.
  • 26 Jan 2026 Another accepted publication with Matt Landers, SPIN will be presented @ ICLR 2026 in Rio de Janiero!
  • 30 Dec 2025 I'm honored to be serving as a Workshop Co-chair for RLC 2026, the best AI/ML conference!
  • 2 Dec 25 Will be at NeurIPS in San Diego, presenting BraVE with Matt Landers

Research

Current research projects and interests.

Ongoing

Reinforcement Learning for Foundation Models

Overview Leading research efforts at MBZUAI’s Institute of Foundation Models to develop practical RL techniques for improving language model reasoning and alignment. Our work spans multiple reasoning domains through the...

Learn More
2023-2024

Safety-Critical Offline Reinforcement Learning

Overview Developing risk-sensitive methods for identifying dangerous states and treatments in healthcare settings. Focus on dead-end identification using distributional RL and conservative value estimation for improved patient safety. Motivation In...

Learn More
View All Research

Teaching

Courses, mentorship, and educational activities.

Planned

Introduction to Machine Learning

Course Overview A comprehensive introduction to machine learning for undergraduate students covering supervised learning, unsupervised learning, and practical ML skills. The course emphasizes both theoretical understanding and hands-on implementation of...

Learn More
Planned

Introduction to Reinforcement Learning

Course Overview An introductory course designed for advanced undergraduates and beginning graduate students covering the fundamentals of reinforcement learning. Students will learn core concepts, implement basic algorithms, and understand when...

Learn More
Planned

Advanced Topics in Reinforcement Learning

Course Overview Graduate-level course covering modern RL techniques including offline RL, safe RL, and applications to real-world problems. Emphasis on bridging theory and practice with hands-on projects. Course Description This...

Learn More
Ongoing

PhD Student Mentorship

Overview Actively mentoring PhD students at MBZUAI on projects spanning RL for LLMs, safe decision-making, and practical applications of machine learning. Focus on developing both technical skills and research independence....

Learn More
View All Teaching

Selected Publications

2026

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL
Zhoujun Cheng, Yutao Xie, Yuxiao Qu, Amrith Setlur, Shibo Hao, Varad Pimpalkhute, Tongtong Liang, Feng Yao, Zhengzhong Liu, Eric Xing, Virginia Smith, Ruslan Salakhutdinov, Zhiting Hu, Taylor W. Killian, Aviral Kumar
arXiv pre-print
We establish scaling laws for LLM Reinforcement Learning by identifying how to optimally allocate compute across parallel rollouts, problem batch size, and update steps. It reveals that increasing parallel rollouts per problem is the primary driver of performance—improving solution quality for easy tasks and coverage for hard ones—while providing practical rules for compute-efficient post-training.
Improving and Accelerating Offline RL in Large Discrete Action Spaces with Structured Policy Initialization
Matt Landers, Taylor W. Killian, Tom Hartvigsen, Afsaneh Doryab
ICLR 2026
SPIN is a two-stage framework that optimizes reinforcement learning in complex combinatorial spaces by first pre-training an Action Structure Model to learn valid action patterns and then training lightweight heads for control. This approach significantly improves performance and stability, outperforming current methods by up to 39% in rewards while achieving convergence up to $12.8\times$ faster.

2025

SAINT: Attention-Based Policies for Discrete Combinatorial Action Spaces
Matt Landers, Taylor W. Killian, Tom Hartvigsen, Afsaneh Doryab
arXiv pre-print
SAINT is a novel policy architecture that uses Transformers to model combinatorial action spaces as unordered sets, capturing complex sub-action dependencies via self-attention. This permutation-invariant approach significantly outperforms traditional baselines in environments with up to $1.35 \times 10^{18}$ possible actions by improving sample efficiency and joint behavior modeling.
BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces
Matt Landers, Taylor W. Killian, Hugo Barnes, Tom Hartvigsen, Afsaneh Doryab
NeurIPS 2025
BraVE addresses the computational challenges of high-dimensional, discrete action spaces in offline RL by using tree-structured traversal to capture sub-action dependencies efficiently. This approach enables the evaluation of a linear number of joint actions, outperforming existing methods by up to 20x in complex environments.
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Haonan Li, Taylor W. Killian, Mikhail Yurochkin, Zhengzhong Liu, Eric P. Xing, Zhiting Hu
NeurIPS 2025
Reinforcement learning has emerged as a promising approach to improve large language model reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general reasoning. We introduce Guru, a curated RL reasoning corpus spanning six reasoning domains.
K2-Think: A Parameter-Efficient Reasoning System
Zhoujun Cheng, Richard Fan, Shibo Hao, Taylor W. Killian, Haonan Li, Suqi Sun, Hector Ren, Alexander Moreno, Daqian Zhang, Tianjun Zhong, Yuxin Xiong, Yuanzhe Hu, Yutao Xie, Xudong Han, Yuqi Wang, Varad Pimpalkhute, Yonghao Zhuang, Aaryamonvikram Singh, Xuezhi Liang, Anze Xie, Jianshu She, Desai Fan, Chengqian Gao, Liqun Ma, Mikhail Yurochkin, John Maggs, Xuezhe Ma, Guowei He, Zhiting Hu, Zhengzhong Liu, Eric P. Xing
MBZUAI IFM Technical Report
K2-Think is a parameter-efficient 32B model that rivals much larger systems by combining long chain-of-thought training with advanced test-time computation techniques. It achieves state-of-the-art reasoning performance in math, code, and science while delivering ultra-fast inference speeds of over 2,000 tokens per second.
Robust Autonomy Emerges from Self-Play
Marco Cusumano-Towner, David Hafner, Alex Hertzberg, Brody Huval, Aleksei Petrenko, Eugene Vinitsky, Erik Wijmans, Taylor W. Killian, Stuart Bowers, Ozan Sener, Philipp Krahenbuhl, Vladlen Koltun
ICML 2025
We developed a robust autonomous driving agent, in simulation, via self-play at massive scale. This simulator was designed to run in extensively parallel settings where we could aggressively randomize each agent's physical and behavior characteristics and generate substantial amounts of experience.
View All Publications arXiv

Musings and Reflections

I'm working on sharing insights from my research and experiences in reinforcement learning, machine learning, and more.

View All Posts