Langachan shows that people don't have ai reiders who do not have ai reidents but because of their devices are transferred


Join our daily and weekly newsletters for the most recent updates and specific content of the industry AI's business. learn more


As soon as representatives of Ai have promised to promise, it had been curare by whom if one agent was enough, or they should invest in construction out a multi-agent network that touch more points in their group.

Orchestra frame company Langin have tried to get closer to a response to this question. It is under control AI to a number of examrips to find context and tools before their performance begins to decrease. These trials could lead to a better understanding of the architecture required to maintain bread -gress and multi-time systems.

In a A blog postLanghan explained a set of exams who made a single agent and inscripted. Hope Langcher Langacan would be answered, “What is one-representation reaction to be loaded by guidance and tools, allow to carry out a performance drug?”

Langin chose the Framework Frane Frane Frame Because it's “one of the most basic guidance.”

As long as it performs date of date often leads to false productsLangins chose to limit the test to two most interesting activities: responding questions and record meetings.

“There are lots of markers for the use of tools and call to a tool, but for the purposes of this test, we wanted to evaluate,” Write us This e-mail assistant, which depends on two main areas of work – dealing with and registering requests and supporting applicants with the questions. “

Langin Test Test

Langlic generally used response agents in advance through LANANEL PLAY. These representatives will be calling books that matter to calling large language modules (LMS) to be part of a trial of a ruins. These llms include Claude 3.5 SnNet, LLAM-3.3-70b and three modules from Openani, Gpt-4O, O1 and O3-mini.

The company broke to better evaluate an e-mail for e-mail on both actions, creating a list of steps to follow. He started with the email support capabilities support, which looks at how the proper is accepted by a client from client.

Langin was first called the Device calling a trail to call, or the agent of an agent of an agent of the agent Trouble. If the agent owned the correct order, he spent the test. Next, researchers asked to provide the participation and used LLL to judge.

For the second job, registering a calendar, Langcha was able to follow the ability to follow the management agent.

“In other words, the agent is required to remember to provide special guidance, such as simply when the researchers should be registered with various parties.

Crosses the agent

Once they define parameters, Lanks set out stress and overlapping the e-mail supporter.

It will establish 30 action each for customer calendar registration and support. These were run three times (for 90 rivers. The researcher had created a calendar recording agents and the agent of customer support to assess the tasks.

“A calendar registration agent is only accessible to the range of the captain of the captain, and customer support only in the land of Customer Support Land,” LANGACACRACHA.

Then the researchers increased increased activities to the representatives to increase the number of responsibilities. These may be passing from human resources, legally technical quality, legally and comply with a number of other areas.

A single-up management structure

The evaluations are run, Langin detected that single representatives would often be excessively overcome when you are asked to do too much things. They began to forget to call tools or could not deal with tasks when they received further directors and contexts.

Langin found that the calendar registration agents use GtT-Snet, O1 and O3 when a larger context was provided. “2% reported a presentation of 2%% when the fields increased to at least seven.

Other models did not have much better at other models. LLAM-3.3-70b you forgot the Sund_email device, “he failed all trials.”

There was only one of the name of Claude-3.5-Snet, O1 and O3-mini. However, o3 mini attainment was wake up once inappropriately inappropriately addged to the registration management.

Customer can contact the customer support, but for this exam, Laganan said Lague-3.5-mini, said: 75-mini and O1. He also provided a shallow performance consequence when more ears were added. When the context window expanded, however, the Claude module play worse.

GTT-4O play also amongst all modules shown.

“We saw that how more context was provided, guidance was worse. Some of our actions were designed for the EU-based customers. , “said Lagachan. “We discovered that this instructions should be successfully followed by fewer areas, and that guidance was forgotten more often.”

The company said it examines how to assess the same land-based architecture which uses the same land landland architecture.

Langin is already deposited in representatives of representatives, as he introduced the a concept of “environmental representatives“Or repair of agents in the background and encouraged by specific events. The tests could make these easier to ensure an unknown performance.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *