Here comes Nvidia. World models — AI models that take inspiration from the mental models of the world that humans naturally develop.
At the Consumer Electronics Show in Las Vegas, the company announced the open availability of a family of global models that can predict and produce “physics-aware” videos. Nvidia calls this family Cosmos World Foundation Models, or Cosmos WFM.
Models that can be fine-tuned for specific applications are available from Nvidia's API and NGC catalogs and the AI developer platform Hugging Face.
“Nvidia makes first wave of Cosmos WFMs available for physics-based simulation and synthetic data generation,” the company wrote in a blog post provided to TechCrunch. “Researchers and developers, regardless of their company size, are free to use Cosmos models under Nvidia's permissive open model license.”

The Cosmos WFM family has several models divided into three categories. Nano For low-latency, real-time applications; Super For “High Performance Baseline” models; versus Ultra For maximum quality and fidelity output.
Models range in size from 4 billion to 14 billion, with the Nano being the smallest and the Ultra the largest. Parameters are roughly related to a model's problem-solving ability, and models with more parameters generally perform better than those with fewer parameters.
As part of Cosmos WFM, Nvidia has developed an “upsampling model”; Generates good models for applications such as video decoder and guardrail models optimized for augmented reality, as well as sensor data generation for autonomous vehicle development. . In addition to these, other Cosmos WFM models are based on real-world human interaction; the environment industry Nvidia says the robots were trained on 9,000 trillion tokens from 20 million hours of driving data. (In AI, “tokens” represent bits of raw data — in this case, video footage.)
Nvidia won't say where this training data came from, but at least one report — and Lawsuit — Accused. The company trained YouTube videos that were copyrighted without permission. We've reached out to Nvidia's press team for comment and will update this section if they hear back.
Cosmos WFM models provide text or video frames to robots, Nvidia claims it can generate “controllable high-quality” synthetic data to bootstrap training models for self-driving cars.

“Nvidia Cosmos' suite of open models means developers can customize WFMs with data sets such as autonomous video recorders or robots navigating the warehouse according to the needs of their target application,” Nvidia wrote in a press release. “Cosmos WFMs are purpose-built for physical AI research and development and can generate physics-based videos from synthetic inputs such as text, images and video, robotic sensor or motion data.”
Waabi, Wave, Companies including Fortellix and Uber have already committed to piloting Cosmos WFMs for a variety of use cases, from video search and editing to building AI models for self-driving vehicles, Nvidia said.
It's important to note that Nvidia's global models are not “open source” in the strictest sense. to follow A widely accepted definition An AI model of “open source” AI needs to provide enough information about its design so that a person can “significantly” recreate it and disclose details related to its training data, including proof and how the data is available; Licensed
Nvidia does not release details of the Cosmos WFM training data, nor does it provide all the tools needed to recreate the models from scratch. So maybe the tech giant is referring to “open” models as opposed to open source.