ICLR2025 Paper Announcements

We are excited to announce that REALM will be presenting four papers at ICLR 2025! Below you can find a brief description and dedicated project website for each paper.

Discrete GCBF Proximal Policy Optimization for Multi-agent Safe Optimal Control

Songyuan Zhang1, Oswin So1, Mitchell Black2, Chuchu Fan1
1Massachusetts Institute of Technology     2MIT Lincoln Laboratory

Brief Description: This work addresses how to extend the control barrier function (CBF) elegantly for safe multi-agent reinforcement learning (MARL). We introduce DGPPO (Discrete GCBF PPO) for safe MARL which tackles unknown discrete-time dynamics, partial observability, changing neighborhoods, and input constraints, without a distributed high-performance nominal policy that can achieve the task. DGPPO learns both a discrete graph CBF which handles neighborhood changes and input constraints, and a distributed high-performance safe policy for MAS with unknown discrete-time dynamics. We empirically validate our claims on a suite of multi-agent tasks spanning three different simulation engines. The results suggest that, compared with existing methods, our DGPPO framework obtains policies that achieve high task performance (matching baselines that ignore the safety constraints), and high safety rates (matching the most conservative baselines), with a constant set of hyperparameters across all environments.
Project website: https://mit-realm.github.io/dgppo/

  • We propose a method of learning discrete CBFs (DCBF) for unknown discrete-time dynamics and with input constraints.
  • We propose the discrete GCBF (DGCBF), for ensuring safety under varying neighborhoods in the limited sensing setting and extend the DCBF learning above to the case of DGCBF.
  • We propose Discrete GCBF PPO (DGPPO), a framework combining MARL and DGCBF for multi-agent safe optimal control problem with unknown dynamics and limiting sensing without a known performant nominal policy.
  • Compared to existing methods that require different choices of hyperparameters per environment, DGPPO achieves the lowest cost compared to baselines with near 100% safety rate using a single set of hyperparameters.

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Yilun Hao1, Yang Zhang2, Chuchu Fan1
1Massachusetts Institute of Technology     2MIT-IBM Watson AI Lab

Brief Description: In this work, we observe that the core of many planning problems lies in optimization problems: searching for the optimal solution (best plan) with goals subject to constraints (preconditions and effects of decisions). With LLMs’ commonsense, reasoning, and programming capabilities, this opens up the possibilities of a universal LLM-based approach to planning problems. We propose LLMFP, a general-purpose framework that leverages LLMs to capture key information from planning problems and formally formulate and solve them as optimization problems from scratch, with no task-specific examples needed. We apply LLMFP to 9 planning problems, ranging from multi-constraint decision making to multi-step planning problems, and demonstrate that LLMFP achieves on average 83.7% and 86.8% optimal rate across 9 tasks for GPT-4o and Claude 3.5 Sonnet, significantly outperforming the best baseline (direct planning with OpenAI o1-preview) with 37.6% and 40.7% improvements. We also validate components of LLMFP with ablation experiments and analyze the underlying success and failure reasons.
Project website: https://sites.google.com/view/llmfp

  • We offer a novel perspective on using LLMs to solve planning problems by rigorously constructing optimization problems from scratch, alike how human experts use optimization tools for planning.
  • We propose LLMFP, a general-purpose planning framework with zero-shot generalization capability. To our knowledge, LLMFP is the first to enable LLMs to build and solve diverse types of planning problems as optimization problems with no task-specific examples or external critics.
  • LLMFP notably achieves 83.7% and 86.8% optimal rates for GPT-4o and Claude 3.5 Sonnet, outperforming the best baseline (direct planning with OpenAI o1-preview) by 37.6% and 40.7%. We examine the effectiveness of our framework and analyze the success and failure reasons.


Rare event modeling with self-regularized normalizing flows: what can we learn from a single failure?

Charles Dawson1, Van Tran2, Max Li3, Chuchu Fan1
1Massachusetts Institute of Technology 2Havard University 3University of Michigan

Brief Description: Increased deployment of autonomous systems in fields like transportation and robotics have seen a corresponding increase in safety-critical failures. These failures can be difficult to model and debug due to the relative lack of data: compared to tens of thousands of examples from normal operations, we may have only seconds of data leading up to the failure. This scarcity makes it challenging to train generative models of rare failure events, as existing methods risk either overfitting to noise in the limited failure dataset or underfitting due to an overly strong prior. We address this challenge with CalNF, or calibrated normalizing flows, a self-regularized framework for posterior learning from limited data. CalNF achieves state-of-the-art performance on data-limited failure modeling and inverse problems and enables a first-of-a-kind case study into the root causes of the 2022 Southwest Airlines scheduling crisis.
Project website: https://openreview.net/forum?id=gQoBw7sGAu

  • Increased deployment of autonomous systems means increased risk of safety-critical failures
  • Failure events are hard to model due to scarce data — we might have only a handful of data points from failures, compared to thousands from normal operations.
  • Existing methods struggle to learn from scarce or imbalanced datasets, either overfitting to noise or underfitting due to strong priors.
  • We introduce CalNF (Calibrated Normalizing Flows): a self-regularized approach for learning from limited failure data. CalNF achieves state-of-the-art results in failure modeling and data-limited inverse problems.
  • We apply CalNF to learn models of real-world failures, including the 2022 Southwest Airlines scheduling crisis.

Steering Large Language Models between Code Execution and Textual Reasoning

Yongchao Chen1, Harsh Jhamtani3, Srinagesh Sharma3, Chuchu Fan2, Chi Wang4
1Havard University 2Massachusetts Institute of Technology 3Microsoft 4Google DeepMind

Brief Description: Our research highlights the limitations of textual reasoning in LLMs for tasks involving math, logic, and optimization, where code generation offers a more scalable solution. Despite advances like OpenAI’s GPT Code Interpreter and AutoGen, no optimal method exists to reliably steer LLMs between code and text generation. This study identifies key patterns in how LLMs choose between code and text with various factors and proposes three methods to improve steering.
Project website: https://yongchao98.github.io/CodeSteer/

  • Deciding between code and text remains a challenge: No single method is optimal across tasks; trade-offs exist in accuracy, token length, and runtime.
  • Forcing code doesn’t always improve accuracy: Writing correct code is hard in some tasks, and code constraints can limit reasoning. Some generated code resembles text rather than functional implementation.
  • Task complexity and model size affect performance: Smaller models like GPT-3.5 sometimes outperform larger ones (e.g., GPT-4o) due to confidence in textual reasoning, leading to inverse scaling in specific tasks.
  • Mixing code, textual reasoning, and multi-turn refinement improves results: Optimized hybrid methods enhance performance, but increased code usage raises runtime and token costs.