Amazon is preparing to reposition its voice-activated digital assistant Alexa as an intelligent “agent” that can complete practical tasks, as the technology group races to solve the challenges that AI systems have faced.
The company of 2.4tn of $ 2.4tn two years ago wants to reinvent Alexa, its conversational system installed in 500mn devices of consumers around the world, so the “brain” of the software is included in the production AI.
Rohit Prasad, who leads the artificial general intelligence (AGI) group at Amazontold the Financial Times that the voice assistant still needs to overcome many technical hurdles before launch.
This includes solving the problem of “hallucinations” or artificial responses, response speed or “latency”, and reliability. “Perceptions should be close to zero,” says Prasad. “It's still an open problem in the industry, but we're working hard on it.”
The vision of Amazon's leadership is to transform Alexa, currently used for a narrow set of simple tasks such as playing music and setting alarms, into an “agent” product that acts as a personal concierge. This can include anything from recommending restaurants to adjusting the lights in the bedroom based on a person's sleep cycles.
The adaptation of Alexa is already in train since the launch of OpenAI's ChatGPT, supported by Microsoft, in late 2022. While Microsoft, Google, Meta and others have quickly installed AI production in their computing platform and improve their software services, critics have questioned whether Amazon can. resolved its technical and organizational struggles early enough to compete with its competitors.
According to several employees who have worked on Amazon's voice assistant teams over the years, the effort has been plagued by problems and follows years of AI research and development.
Many former employees said that the long wait for the release was largely due to the unexpected difficulties involved in changing and combining the simple, predefined algorithms that Alexa is built on, with a more powerful but unpredictable language model.
In response, Amazon said it was “working hard to make its voice assistant even more efficient and helpful.” It added that the implementation of technology on this scale, in a live service and a suite of tools used by customers around the world, had never been seen before, and it is not as simple as passing the LLM in the Alexa service.
Prasad, the former chief designer of Alexa, said last month that the release of Amazon Nova's internal models of the company – led by his AGI team – was partly motivated by certain needs for high speed, cost and reliability, in order to help AI. applications like Alexa “get to the last mile, really hard”.
To function as an agent, Alexa's “brain” must be able to call hundreds of third-party software and services, Prasad said.
We sometimes underestimate how many services Alexa has integrated, and it's a huge number. These applications receive billions of requests per week, so when you're trying to make reliable transactions happen quickly. . . you have to be able to do it in a cost-effective way,” he added.
The difficulty comes from Alexa users who expect fast responses and very high levels of accuracy. Those characteristics are at odds with the probabilistic nature of today's AI, numerical software that predicts words based on speech and language patterns.
Some former employees also point to struggles to maintain the original features of the server, including its consistency and functionality, while infusing it with new features such as creativity and free chat.
Because of the more personal, conversational nature of LLMs, the company also plans to hire experts to personalize the AI, voice and diction to keep it familiar to Alexa users, according to one person familiar with the matter.
One former senior member of the Alexa team said that while LLMs are very sophisticated, they come with risks, such as producing answers that are “totally contrived at times”.
“At the rate Amazon operates, it can happen multiple times a day,” they said, damaging the brand and reputation.
In June, Mihail Eric, a former machine learning scientist at Alexa and a founding member of the “conversation models group”, he said publicly that Amazon “dropped the ball” in being “the undisputed market leader in conversational AI” with Alexa.
Eric said that despite having strong scientific talent and “huge” financial resources, the company was “fraught with technical and administrative problems”, suggesting that “data was poorly defined” and “documents were missing or old”.
According to two former employees who worked on AI related to Alexa, the historical technology that supports the voice assistant was inflexible and difficult to change quickly, it was constrained by the code and the engineering team was “too distributed”.
The original Alexa software, built on technology acquired from the British start-up Evi in 2012, was a question-answering machine that works by searching within a defined environment of reality to find the right answer, such as the weather of the day or a specific. a song in your music library.
The new Alexa uses a bouquet of different AI models to recognize and translate voice questions and generate answers, as well as identify policy violations, such as taking inappropriate answers and ideas. Building software to translate between legacy systems and the new AI model has been a major hurdle in the Alexa-LLM integration.
Models include Amazon's internal software, including the latest Nova models, and Claude, an AI model from startup Anthropic, in which Amazon has invested. $8bn more in the course of the last 18 months.
“(T)he biggest challenge around AI agents is making sure they are safe, reliable and predictable,” Anthropic chief executive Dario Amodei told the FT last year.
Agent-like AI software must reach a point “where . . . people can trust the system”, he added. Once we reach that point, we will release these programs.
One current employee said other steps are still needed, such as rolling out child safety filters and testing custom integrations with Alexa like smart lights and the Ring doorbell.
“Reliability is an issue—making it work close to 100 percent of the time,” added the employee. “That's why you see us . . . or Apple or Google is delivering slowly and steadily. ”
Many of the third parties developing “skills” or features for Alexa said they were unsure when the new AI productivity tool would be released and how to design new tasks.
“We expect details and insights,” said Thomas Lindgren, founder of Swedish content developer Wanderword. “When we started working with them they were more open . . . then, over time, they changed.”
One colleague said that after an initial period of “pressure” placed on developers by Amazon to start preparing for the next generation of Alexa, things have gone quiet.
A constant challenge for Amazon's Alexa team – which has been hit by 2023 – is how to monetize it. Proving how to make the assistants “cheap enough to run at scale” will be a big task, said Jared Roesch, co-founder of AI developer OctoAI.
The options being discussed include creating a new service to register Alexa, or taking a cut of sales of goods and services, said the former Alexa employee.
Prasad said Amazon's goal is to create a variety of AIs that can serve as “building blocks” for Alexa's various programs.
“What we're always at the core of is consumerism and AI in action, we're not doing science for science's sake,” Prasad said. “We will do this. . . delivering customer value and impact, which in this era of AI manufacturing is more important than ever because customers want to see a return on investment. ”