Safety-Critical Offline Reinforcement Learning

Overview

Developing risk-sensitive methods for identifying dangerous states and treatments in healthcare settings. Focus on dead-end identification using distributional RL and conservative value estimation for improved patient safety.

Motivation

In safety-critical domains like healthcare, optimal policies may not be attainable from limited offline data. However, negative outcomes can be leveraged to identify behaviors to avoid, guarding against overoptimistic decisions.

Technical Approach

Risk-Sensitive Dead-End Identification

Distributional RL: Model value distributions rather than expected values
Conditional Value-at-Risk (CVaR): Account for worst-case outcomes
Tunable Risk Tolerance: Adjust conservatism based on application needs

Dead-End Discovery Framework

State Construction: Learn meaningful patient state representations
Dead-End Discovery: Identify high-risk states using distributional value estimates
Dead-End Confirmation: Validate discovered dead-ends through multiple independent models

Key Results

Earlier identification of dangerous states compared to expectation-based methods
Tunable risk sensitivity enables domain-expert control
Framework applicable across different healthcare settings (sepsis, diabetes, etc.)

Publications

Risk Sensitive Dead-end Identification in Safety-Critical Offline Reinforcement Learning (TMLR 2023)

Paper Forum Code

Medical Dead-ends and Learning to Identify High-Risk States and Treatments (NeurIPS 2021)

Paper Poster Code MSR Blog

Impact

This work provides practical tools for clinicians to identify potentially dangerous treatment paths, complementing existing clinical guidelines with data-driven safety analysis.