Reinforcement Learning for Foundation Models

Overview

Leading research efforts at MBZUAI’s Institute of Foundation Models to develop practical RL techniques for improving language model reasoning and alignment. Our work spans multiple reasoning domains through the LLM360 initiative.

Key Objectives

Develop cross-domain RL approaches for LLM reasoning
Create scalable reward signals for diverse reasoning tasks
Build open-source models and datasets for community research

Current Work

Guru: Cross-Domain RL Reasoning

We introduced Guru, a curated RL reasoning corpus spanning six reasoning domains: Math, Code, Science, Logic, Simulation, and Tabular reasoning. This work systematically revisits established findings in RL for LLM reasoning and observes significant variation across domains.

Key Findings:

RL effectiveness varies substantially across reasoning domains
Domain-specific reward design is critical for success
Open datasets enable reproducible research in this space

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective (2025)

Paper Website Code Data Models

Impact

This research aims to broaden the applicability of RL beyond traditional domains (math/code) to enable more general reasoning capabilities in foundation models. Our open-source approach ensures accessibility for the broader research community.

Overview

Key Objectives

Current Work

Guru: Cross-Domain RL Reasoning

Related Publications

Impact