Researchers improve AI agent performance on unfamiliar tasks using 'Dungeons and Dragons'


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. learn more


Organizations interested in using AI agents need to fine-tune them first, especially in workflows that often feel cumbersome. While some organizations want agents to perform only one type of action in a single workflow, sometimes agents need to be introduced to new environments in the hope that they will change.

Researchers from the Beijing University of Posts and Telecommunications has released a new method, AgentRefine. It teaches self-correcting agents, leading to more general and adaptive AI agents.

The researchers said that current tuning methods limit agents to the same tasks as their training set, or “hold-in” tasks, and do not perform as well for “hold-in” environments out,” or new environments. By following only the rules defined by the training data, agents trained by these frameworks would have difficulty “learning” from their mistakes and could not be made general agents and into new workflows.

To combat that limitation, AgentRefine aims to create more extensive agent training datasets that allow the model to learn from mistakes and adapt to new workflows. In a new paperthe researchers stated that the goal of AgentRefine is to “improve general agent tuning data and establish the relationship between agent generalization and self-improvement.” If agents are self-correcting, they will not follow any mistakes they have learned and will bring those same mistakes to other environments in which they are used.

“We find that an agent's tuning of the self-descriptive data improves the agent to explore more feasible actions while encountering adverse situations, thereby leading to generalization better to new agent environments,” the researchers write.

AI agent training inspired by D&D

Taking the spotlight from the tabletop roleplaying game Dungeons & Dragons, the researchers created personas, scripts for the agent to follow and challenges. And yes, there is a Dungeon Master (DM).

They divided data acquisition for AgentRefine into three areas: script generation, route generation and validation.

In script generation, the model creates a script, or instructions, with information about the environment, actions and actions that people can perform. (The researchers tested AgentRefine using Llama-3-8B-Instruct, Llama-3-70B-Instruct, Mistral-7B-Instruct-v0.3, GPT-4o-mini and GPT-4o)

The model then generates agent data that has errors and acts as both a DM and a player at the track level. He evaluates the tasks he can do and then sees if there are any mistakes in them. The final stage, verification, examines the script and the path, giving the agents it trains the opportunity to self-correct.

Better and more diverse functional skills

The researchers found that agents trained using the AgentRefine method and data performed better on diverse tasks and adapted to new situations. These agents self-correct more to redirect their actions and decisions to avoid mistakes, and become stronger in the process.

In particular, AgentRefine improved the performance of all modules to work on excluded tasks.

Enterprises need to make agents more adaptable to tasks so that they only repeat what they have learned so that they can become better decision makers. Agents orchestrate not only “direct traffic” for multiple agents but also verify that agents have completed actions based on user requests.

Open AIand o3 offers a “synthesis program” which could improve work flexibility. Orchestration and other training frameworks, similar to Magnetic-One from Microsoftsetting tasks for managing agents to learn when to delegate tasks to different agents.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *