Overview
Leading research efforts at MBZUAI’s Institute of Foundation Models to develop practical RL techniques for improving language model reasoning and alignment. Our work spans multiple reasoning domains through the LLM360 initiative.
Key Objectives
- Develop cross-domain RL approaches for LLM reasoning
- Create scalable reward signals for diverse reasoning tasks
- Build open-source models and datasets for community research
Current Work
Guru: Cross-Domain RL Reasoning
We introduced Guru, a curated RL reasoning corpus spanning six reasoning domains: Math, Code, Science, Logic, Simulation, and Tabular reasoning. This work systematically revisits established findings in RL for LLM reasoning and observes significant variation across domains.
Key Findings:
- RL effectiveness varies substantially across reasoning domains
- Domain-specific reward design is critical for success
- Open datasets enable reproducible research in this space
Related Publications
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective (2025)
Impact
This research aims to broaden the applicability of RL beyond traditional domains (math/code) to enable more general reasoning capabilities in foundation models. Our open-source approach ensures accessibility for the broader research community.