To interact with the real world, AI will acquire physical intelligence


Recently artificial intelligence The models are surprisingly human-like in their ability to produce text, audio, and video when prompted. However, until now, these algorithms have largely been applied to the digital world, rather than the three-dimensional physical world in which we live. In fact, whenever we try to apply these models to the real world, even the most complex struggle to fully implement them—think, for example, of developing autonomous cars. How difficult it was to drive safely and reliably. Despite having artificial intelligence, these models not only fail to grasp physics but also frequently hallucinate, leading to inexplicable mistakes.

However, this is the year that AI will finally make the leap from the digital world to the real world in which we live. Expanding AI beyond the digital frontier requires remaking the way machines think, combining the digital intelligence of AI with the mechanical power of robots. This is what I call “physical intelligence,” a new form of intelligent machines that can understand dynamic environments, cope with unpredictable situations, and make decisions in real time. Unlike the models used by standard AI, physical intelligence is rooted in physics; in understanding real-world fundamentals, such as cause and effect.

Such features enable physical intelligence models to interact and adapt to different environments. In my research group at MIT, we are developing physical intelligence models that we call fluid networks. For example, in one experiment, we trained two drones—one powered by a standard AI model and another powered by a liquid network—to locate objects in the forest in the summer, using data collected by humans. While both drones performed equally well when tasked with doing exactly what they were trained to do, when they were asked to locate objects in different circumstances each other—in winter or in urban environments—only liquid-network drones have done their job well. This experiment shows us that, unlike traditional AI systems that stop evolving after the initial training phase, liquid networks continue to learn and adapt from experience, just like humans.

Physical intelligence can also interpret and execute complex commands derived from text or images, bridging the gap between digital instructions and real-world execution. For example, in my lab, we have developed a physical intelligence system that, in less than a minute, can iteratively design and then 3D print small robots based on prompts like “the robot can walk forward” or “the robot can grip”. objects”.

Other labs are also making significant breakthroughs. For example, robotics startup Covariant, founded by UC-Berkeley researcher Pieter Abbeel, is developing chatbots—similar to ChatGTP—that can control robotic arms when prompted. They have raised more than $222 million to develop and deploy sorting robots in warehouses globally. A research team at Carnegie Mellon University also recently did prove that a robot with only one camera and imprecise actuation could perform dynamic and complex parkour movements—including jumping on double-height obstacles and crossing double-height gaps its length—using a single neural network trained through reinforcement learning.

If 2023 was the year of text-to-image and 2024 was the year of text-to-video, 2025 will mark the era of physical intelligence, with a new generation of devices—not just robots but any anything from the power grid to smart homes—can interpret what we're telling them and perform real-world tasks.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *