NeurIPS 2025 Paper Announcements

We are excited to announce that REALM will be presenting two papers at NeurIPS 2025! Below you can find a brief description and dedicated project website for each paper.

Neural Combinatorial Optimization for Time Dependent Traveling Salesman Problem

Ruixiao Yang and Chuchu Fan
MIT

Website | Code

The paper addresses the Time-Dependent Traveling Salesman Problem (TDTSP), in which travel times between locations vary with departure time due to real-world factors such as traffic congestion.
We propose a novel neural architecture that uses an adjacency tensor and a dual-graph attention encoder to capture complex spatiotemporal dynamics simultaneously. The solution also includes an inference-time Mixture of Experts (MoE) and local search refinement to handle instances in which temporal dependencies significantly alter the optimal route.
The model achieves state-of-the-art performance, outperforming existing heuristics and neural baselines as problem sizes grow. Beyond performance, we find that many TDTSP instances in real-world data maintain static optimal solutions, prompting us to propose a more rigorous evaluation method focused on “selected instances” in which time-aware routing provides real benefits.

HMARL-CBF – Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems

H M Sabbir Ahmad^1,2, Ehsan Sabouni¹, Alex Wasilkoff¹, Param Budhraja¹, Zijian Guo¹, Songyuan Zhang², Chuchu Fan²,Christos G. Cassandras¹,Wenchao Li¹
¹BU ²MIT

Website

This paper addresses the problem of safe policy learning for safety-critical multi-agent autonomous systems.
We introduce a novel centralized-training–decentralized-execution (CTDE) hierarchical multi-agent reinforcement learning framework that guarantees safety using Control Barrier Functions (CBFs). This approach enforces pointwise-in-time safety constraints throughout both training and execution across each trajectory.
Our hierarchical structure leverages skill-based decomposition, where the high-level policy coordinates cooperative behaviors across agents, while the low-level policy learns and executes safe skills guided by CBF-based safety constraints. This design guarantees safety both during training and in real-world deployment.
We validate our proposed approach on challenging conflicting environment scenarios where a large number of agents must each travel safely from their origin to their destination without colliding with other agents. Simulation results demonstrate superior performance and safety compliance, achieving a near-perfect (>= 95%) success/safety rate compared to existing benchmark methods.