Overview

Leading research efforts at MBZUAI’s Institute of Foundation Models to develop practical RL techniques for improving language model reasoning and alignment. Our work spans multiple reasoning domains through the LLM360 initiative.


Key Objectives

  • Develop cross-domain RL approaches for LLM reasoning
  • Create scalable reward signals for diverse reasoning tasks
  • Build open-source models and datasets for community research


Current Work

Guru: Cross-Domain RL Reasoning

We introduced Guru, a curated RL reasoning corpus spanning six reasoning domains: Math, Code, Science, Logic, Simulation, and Tabular reasoning. This work systematically revisits established findings in RL for LLM reasoning and observes significant variation across domains.

Key Findings:

  • RL effectiveness varies substantially across reasoning domains
  • Domain-specific reward design is critical for success
  • Open datasets enable reproducible research in this space


Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective (2025)

Paper Website Code Data Models


Impact

This research aims to broaden the applicability of RL beyond traditional domains (math/code) to enable more general reasoning capabilities in foundation models. Our open-source approach ensures accessibility for the broader research community.