Photo: AdobeStock
Traveling requires considerations for location, cost and availability of hotels, transportation, restaurants, and more. A new method from the MIT-IBM Watson AI Lab combines a large language model and a solver to assist with this frequently encountered problem.

Inroads to personalized AI trip planning

Travel agents help to provide end-to-end logistics — like transportation, accommodations, meals, and lodging — for businesspeople, vacationers, and everyone in between. For those looking to make their own arrangements, large language models (LLMs) seem like they would be a strong tool to employ for this task because of their ability to iteratively interact using natural language, provide some commonsense reasoning, collect information, and call other tools in to help with the task at hand. However, recent work has found that state-of-the-art LLMs struggle with complex logistical and mathematical reasoning, as well as problems with multiple constraints, like trip planning, where they’ve been found to provide viable solutions 4 percent or less of the time, even with additional tools and application programming interfaces (APIs).

Subsequently, a research team from MIT and the MIT-IBM Watson AI Lab reframed the issue to see if they could increase the success rate of LLM solutions for complex problems. “We believe a lot of these planning problems are naturally a combinatorial optimization problem,” where you need to satisfy several constraints in a certifiable way, says Chuchu Fan, associate professor in the MIT Department of Aeronautics and Astronautics (AeroAstro) and the Laboratory for Information and Decision Systems (LIDS). She is also a researcher in the MIT-IBM Watson AI Lab. Her team applies machine learning, control theory, and formal methods to develop safe and verifiable control systems for robotics, autonomous systems, controllers, and human-machine interactions.

Noting the transferable nature of their work for travel planning, the group sought to create a user-friendly framework that can act as an AI travel broker to help develop realistic, logical, and complete travel plans. To achieve this, the researchers combined common LLMs with algorithms and a complete satisfiability solver. Solvers are mathematical tools that rigorously check if criteria can be met and how, but they require complex computer programming for use. This makes them natural companions to LLMs for problems like these, where users want help planning in a timely manner, without the need for programming knowledge or research into travel options. Further, if a user’s constraint cannot be met, the new technique can identify and articulate where the issue lies and propose alternative measures to the user, who can then choose to accept, reject, or modify them until a valid plan is formulated, if one exists.

“Different complexities of travel planning are something everyone will have to deal with at some point. There are different needs, requirements, constraints, and real-world information that you can collect,” says Fan. “Our idea is not to ask LLMs to propose a travel plan. Instead, an LLM here is acting as a translator to translate this natural language description of the problem into a problem that a solver can handle [and then provide that to the user],” says Fan.

Co-authoring a paper on the work with Fan are Yang Zhang of MIT-IBM Watson AI Lab, AeroAstro graduate student Yilun Hao, and graduate student Yongchao Chen of MIT LIDS and Harvard University. This work was recently presented at the Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics.

Breaking down the solver

Math tends to be domain-specific. For example, in natural language processing, LLMs perform regressions to predict the next token, a.k.a. “word,” in a series to analyze or create a document. This works well for generalizing diverse human inputs. LLMs alone, however, wouldn’t work for formal verification applications, like in aerospace or cybersecurity, where circuit connections and constraint tasks need to be complete and proven, otherwise loopholes and vulnerabilities can sneak by and cause critical safety issues. Here, solvers excel, but they need fixed formatting inputs and struggle with unsatisfiable queries.  A hybrid technique, however, provides an opportunity to develop solutions for complex problems, like trip planning, in a way that’s intuitive for everyday people.

“The solver is really the key here, because when we develop these algorithms, we know exactly how the problem is being solved as an optimization problem,” says Fan. Specifically, the research group used a solver called satisfiability modulo theories (SMT), which determines whether a formula can be satisfied. “With this particular solver, it’s not just doing optimization. It’s doing reasoning over a lot of different algorithms there to understand whether the planning problem is possible or not to solve. That’s a pretty significant thing in travel planning. It’s not a very traditional mathematical optimization problem because people come up with all these limitations, constraints, restrictions,” notes Fan.

Translation in action

The “travel agent” works in four steps that can be repeated, as needed. The researchers used GPT-4, Claude-3, or Mistral-Large as the method’s LLM. First, the LLM parses a user’s requested travel plan prompt into planning steps, noting preferences for budget, hotels, transportation, destinations, attractions, restaurants, and trip duration in days, as well as any other user prescriptions. Those steps are then converted into executable Python code (with a natural language annotation for each of the constraints), which calls APIs like CitySearch, FlightSearch, etc. to collect data, and the SMT solver to begin executing the steps laid out in the constraint satisfaction problem. If a sound and complete solution can be found, the solver outputs the result to the LLM, which then provides a coherent itinerary to the user.

If one or more constraints cannot be met, the framework begins looking for an alternative. The solver outputs code identifying the conflicting constraints (with its corresponding annotation) that the LLM then provides to the user with a potential remedy. The user can then decide how to proceed, until a solution (or the maximum number of iterations) is reached.

Generalizable and robust planning

The researchers tested their method using the aforementioned LLMs against other baselines: GPT-4 by itself, OpenAI o1-preview by itself, GPT-4 with a tool to collect information, and a search algorithm that optimizes for total cost. Using the TravelPlanner dataset, which includes data for viable plans, the team looked at multiple performance metrics: how frequently a method could deliver a solution, if the solution satisfied commonsense criteria like not visiting two cities in one day, the method’s ability to meet one or more constraints, and a final pass rate indicating that it could meet all constraints. The new technique generally achieved over a 90 percent pass rate, compared to 10 percent or lower for the baselines. The team also explored the addition of a JSON representation within the query step, which further made it easier for the method to provide solutions with 84.4-98.9 percent pass rates.

The MIT-IBM team posed additional challenges for their method. They looked at how important each component of their solution was — such as removing human feedback or the solver — and how that affected plan adjustments to unsatisfiable queries within 10 or 20 iterations using a new dataset they created called UnsatChristmas, which includes unseen constraints, and a modified version of TravelPlanner. On average, the MIT-IBM group’s framework achieved 78.6  and 85 percent success, which rises to 81.6 and 91.7 percent with additional plan modification rounds. The researchers analyzed how well it handled new, unseen constraints and paraphrased query-step and step-code prompts. In both cases, it performed very well, especially with an 86.7 percent pass rate for the paraphrasing trial.

Lastly, the MIT-IBM researchers applied their framework to other domains with tasks like block picking, task allocation, the traveling salesman problem, and warehouse. Here, the method must select numbered, colored blocks and maximize its score; optimize robot task assignment for different scenarios; plan trips minimizing distance traveled; and robot task completion and optimization.

“I think this is a very strong and innovative framework that can save a lot of time for humans, and also, it’s a very novel combination of the LLM and the solver,” says Hao.

This work was funded, in part, by the Office of Naval Research and the MIT-IBM Watson AI Lab.