Multi-level action tree rollout (MLAT-R): Efficient and accurate online multiagent policy improvement
Andrea Henshall (AERA) and Prof. Sertac Karaman present the Multi-Level Action Tree Rollout (MLAT-R) algorithm, providing improvements to one-agent-at-a-time algorithms that give accurate state value estimates but are often too complex for online use. MLAT-R provides guaranteed policy improvements based on true state values, handles any number of agents, and reduces the action space’s growth from exponential to linear.
Authors: Andrea Henshall, Sertac Karaman
Citation: 2024 IEEE International Conference on Robotics and Automation (ICRA)
Abstract:
Rollout algorithms are renowned for their abilities to correct for the suboptimalities of offline-trained base policies. In the multiagent setting, performing online rollout can require an exponentially large number of optimizations with respect to the number of agents. One-agent-at-a-time algorithms offer computationally efficient approaches to guaranteed policy improvement; however, this improvement is with respect to a state value estimate derived from a potentially poor base policy.
Monte Carlo tree search (MCTS) converges to the true state value estimates; however, the exponentially large search space often makes its online use limited. Here, we present the Multi-Level Action Tree Rollout (MLAT-R) algorithm. MLAT-R provides 1) provable improvement over a base policy, 2) policy improvement with respect to the true state value, 3) applicability to any number of agents, and 4) an action space that grows linearly with the number of agents rather than exponentially. In this paper, we outline the algorithm, sketch a proof of its improvement over a base policy, and evaluate its performance on a challenging problem for which the base policy cannot reach a terminal state. Despite the challenging experimental setup, our algorithm reached a terminal state in 86% of all experiments, compared to 31% for state-of-the-art one-agent-at-a-time algorithms. In experiments involving MCTS, MLAT-R reached a terminal state in 99% of experiments compared to 92% for MCTS. MLAT-R achieved these results while considering an exponentially smaller action space than MCTS.