Thinking Machines: The Rise of World Models

Thinking Machines: The Rise of World Models

For a long time, factory robots succeeded by ignoring the world around them. A robot welding a car frame does not need to see the frame. It only needs to move its arm to a programmed position and trust that the frame arrived there. This works because the factory enforces perfect consistency. Parts arrive in exactly the right orientation. Lighting never changes. Nothing unexpected ever appears. The moment something shifts, even slightly, the robot continues its programmed motion as if nothing happened. It welds empty air or crushes a misaligned part. The robot is not stupid. It simply has no way to perceive or adapt to change. Toyota has spent decades perfecting this approach, and it works well for mass producing the same car body for years. Foxconn uses the same method to assemble iPhones by the millions. But when Apple changes the phone design, Foxconn must reprogram every robot from scratch. That reprogramming takes weeks and costs millions.

Physical AI changes this by giving robots the ability to build an internal model of their environment. This internal model is called a world model. A world model takes in sensor data like camera images and joint angles. It then predicts what will happen next. If a robot with a world model reaches for a box and feels unexpected weight, its world model updates its prediction of where the box will be mid reach. The robot adjusts its grip before the box slips. This prediction ability is the core difference between old automation and Physical AI. Old automation repeats. Physical AI anticipates. NVIDIA has built an entire platform called Cosmos to help robots develop these world models. Tesla is using similar technology to train its Optimus humanoid robot how to handle objects it has never seen before.

What’s a World Model? (And Why Robots Need One)

A world model is essentially a physics engine running inside a robot control system. But instead of simulating a simplified version of physics, it learns the specific physics of the robot body and its environment. It learns how the robot joints move. It learns how much friction exists between the robot gripper and common objects. It learns how quickly a pushed object accelerates. The world model builds this knowledge from real world experience. Then it uses that knowledge to forecast outcomes. Waymo self-driving cars use world models to track every pedestrian and vehicle on the road. The car does not simply see where a pedestrian is right now. It predicts where that pedestrian will be in the next two seconds. That prediction determines whether the car brakes or continues.

Consider what happens when a robot tries to pick up a full cup of water. Without a world model, the robot would execute a pre programmed grip force and lift speed. If the cup weight is different than expected, the robot might drop it or crush it. With a world model, the robot imagines the lift before doing it. The model predicts that a gentle grip will cause the cup to slip. It predicts that a very tight grip will crack the plastic. It searches for a grip force that keeps the cup stable without breaking it. This prediction happens in milliseconds. The robot does not need to try and fail. It learns from imagined failure instead of real failure. Google DeepMind demonstrated this capability with their DayDreamer project, where a robot learned to pick up and move objects after only one hour of real world training. Previous methods would have required several days.

How World Models Help Program Robots

Traditional robot programming requires an engineer to write explicit rules for every situation. The engineer measures the exact weight of a part. The engineer calculates the exact force needed to grip it. The engineer programs a specific speed for moving it. If the part changes slightly, the engineer rewrites the rules. This approach breaks down when the environment is unpredictable. A warehouse robot cannot have a rule for every possible way a box can shift on a conveyor belt. Amazon operates warehouses where millions of different products move through the same conveyor systems. Writing rules for every product is impossible. Amazon instead uses world models trained in simulation. The robot learns a general understanding of how boxes of different sizes and weights behave. It does not need a rule for a specific box. It needs a prediction of how this box will move right now.

World models replace explicit rules with learned expectations. The robot is placed in a simulation that copies the real world. The simulation includes realistic physics for gravity, friction, and collision. The robot tries millions of different actions inside this simulation. For each action, the world model predicts the outcome. The robot learns which actions produce good results without ever touching a real object. This approach works because the simulation does not need to be perfect. It only needs to be close enough that the robot can adjust when moved to the real world. BMW uses this method to train robots for their assembly lines. A BMW factory might have robots installing dashboards, attaching doors, and mounting wheels. Each task used to require separate programming. Now a single world model trained in simulation can handle multiple tasks because it understands the underlying physics of lifting, aligning, and fastening.

The Role of Simulation in Building World Models

Simulation is the tool that makes world models practical. Training a robot entirely in the real world would take years and destroy a lot of hardware. A robot learning to walk would fall thousands of times. Each fall could damage motors or joints. In simulation, a robot can fall a million times without breaking anything. The simulation can also run faster than real time. A task that takes ten seconds in the real world might take one second in simulation. This speed lets the robot accumulate experience that would be impossible to gather otherwise. Boston Dynamics uses simulation extensively to train their Atlas and Spot robots. Before Atlas performs a backflip on camera, it has failed that backflip millions of times inside a computer. Each failure taught the world model something about balance and momentum.

The challenge is making the simulation realistic enough. A simulation that does not model friction correctly will teach the robot wrong lessons. A simulation that ignores how light reflects off surfaces will confuse a robot that relies on cameras. Engineers spend significant effort measuring the real world and copying those measurements into the simulation. They measure how much a concrete floor slows a rolling object. They measure how a rubber gripper deforms under pressure. They add random noise to every simulated sensor to mimic real world imperfections. The goal is not a perfect simulation. The goal is a simulation whose errors are predictable and can be corrected during real world training. Agility Robotics discovered this challenge when training their Digit robot to walk on gravel. Their first simulation assumed gravel behaved like a flat surface. The real robot kept slipping. They had to go back and measure how individual gravel pieces rotate and shift when stepped on.

How Robots Use World Models to Plan Actions

A world model does more than predict what will happen next. It allows a robot to search through possible futures and choose the best one. The robot imagines several different actions. For each action, the world model forecasts the outcome. The robot selects the action that leads to the best forecasted outcome. This is called planning. It is the same basic process a human uses when reaching for a fragile object. You do not just grab. You imagine the consequences of grabbing too hard or too softly. You adjust based on that imagination. Figure AI has built humanoid robots that use this planning approach to handle objects in warehouses. The robot sees a pile of mixed boxes. It imagines three different ways to grab the top box. One way might cause the box to tip. Another way might disturb the boxes underneath. The third way looks stable. The robot executes the third way.

A practical example involves a robot walking on uneven ground. Without a world model, the robot would follow a fixed stepping pattern. A crack in the floor or a small rock would break the pattern and cause a fall. With a world model, the robot forecasts where its foot will land if it takes a certain step. The model might predict that the foot will strike a rock and slide. The robot then adjusts the step length or foot angle to avoid the predicted slide. This forecast and adjustment happens for every single step. The robot is not reacting to a slip after it happens. It is avoiding the slip before it happens by imagining the future. Tesla Optimus uses this method to maintain balance while carrying objects of unknown weight. The robot does not know how heavy a box is until it lifts it. But its world model predicts how the weight will shift its center of mass. The robot adjusts its posture before the shift becomes a problem.

The Simulation to Reality Gap

The biggest problem with world models is that no simulation is perfect. Every simulation makes simplifications. Friction is never exactly constant. Light never behaves exactly as modeled. A robot that trains entirely in simulation will encounter surprises when moved to the real world. A simulated floor might be perfectly flat while a real floor has tiny bumps. A simulated box might have uniform weight while a real box has contents that shift. These differences are called the simulation to reality gap. Cruise, the self driving car company, found that their simulated pedestrians behaved too politely. Real pedestrians jaywalk, step off curbs unexpectedly, and make sudden turns. Cruise had to rebuild their simulation to include these annoying but realistic behaviors.

Closing this gap requires a continuous feedback loop. The robot tries an action in the real world. The world model predicts what should have happened. The engineer compares the prediction to reality and notes the difference. That difference is used to adjust the world model. The model learns that its assumption about floor friction was wrong. It updates that assumption. The next prediction will be more accurate. This loop never ends because the real world changes over time. A floor that was dry in the morning might be wet in the afternoon. A gripper that was clean might become oily. The robot must constantly update its world model to match the current conditions. Waabi, a self driving truck company, built their entire approach around this feedback loop. Their trucks drive mostly in simulation. Every time a real truck encounters a situation the simulation did not predict, that situation gets added to the simulation. The simulation becomes more realistic over time rather than starting perfect.

The Future of General Purpose Robots

Most robots today are single purpose machines. They perform one task because they were programmed with rules for that specific task. A robot that packs boxes cannot also clean floors. A robot that welds car frames cannot also sort packages. World models offer a path to general purpose robots. A robot with a good world model does not need task specific rules. It needs a general understanding of physics and objects. It learns how to grip anything, not just a specific box. It learns how to walk on any surface, not just a specific factory floor. Sanctuary AI is building humanoid robots with this general purpose goal. Their robot Phoenix uses a world model that separates the concept of grasping from the concept of pouring. The robot can apply its understanding of how to hold a coffee mug to a teapot it has never seen. It does not need a new rule for the teapot.

Several companies are pursuing this vision. Tesla is building a humanoid robot designed to switch between different household and factory tasks. The robot does not require reprogramming between tasks because its world model provides general physical intelligence. Figure AI is developing robots that watch a human perform a task once and then replicate it. The robot builds a world model of the task from observation alone. Agility Robotics has deployed humanoid robots in shipping centers where they handle unexpected situations like fallen boxes or jammed conveyors. These robots still have limitations. They struggle with tasks that require fine touch or complex reasoning. But each improvement in world models expands what they can do. The goal is a robot that can enter any environment, observe for a few minutes, and then perform useful work without explicit programming. Boston Dynamics has shown early versions of this capability with Spot, which can be sent into a construction site to inspect progress. Spot does not need a map of the site beforehand. It builds a world model as it walks and uses that model to decide where to look next.

Putting It All Together

Programming robots through world models is fundamentally different from traditional automation. Traditional automation requires engineers to anticipate every possible situation and write a rule for it. This approach is brittle. It fails when the situation is slightly different than expected. World models replace rules with prediction. The robot learns how the world behaves and uses that knowledge to forecast the outcome of its actions. It can handle unexpected situations because it does not rely on pre programmed rules.

The technology is already in use. Waymo self-driving cars use world models to navigate city streets. Amazon uses simulation to train warehouse robots. Boston Dynamics machines use world models to maintain balance on challenging terrain. Tesla trains its Optimus robot using simulated environments that would be too dangerous to create in real life. Agility Robotics has Digit robots unloading trucks in working shipping centers. These systems are not perfect. They still struggle with rare or unusual situations. But they improve continuously as their world models get better data and better training. Physical AI has arrived. It is changing how robots are built and how they behave. The next decade will see robots move from tightly controlled factories into messy human environments. World models are the reason this shift is possible.

Helping Teams Create Products That Actually Stick

Michael Sorrenti and his team at GAME PILL help companies turn ideas into products people can’t stop using. With 26+ years of experience creating games, AI experiences, and digital platforms for global brands like Disney, Marvel, and Nickelodeon, they guide teams to design and launch products that drive engagement, revenue, and growth. From AI strategy and product design to market-ready execution, the team is able and ready to turn complexity into actionable results.

Sources:

Agarwal, Niket, et al. “Cosmos World Foundation Model Platform for Physical AI.” arXiv, 2025, https://arxiv.org/abs/2501.03575.
“Google’s Genie World Model.” The Guardian, 2025, https://www.theguardian.com/technology/2025/aug/05/google-step-artificial-general-intelligence-deepmind-agi.
“NVIDIA Cosmos AI World Models.” Technology.org, 2025, https://www.technology.org/2025/08/12/nvidia-launches-new-cosmos-ai-world-models-and-omniverse-libraries-for-next-gen-robotics/.
“Physical AI with World Foundation Models.” NVIDIA, https://www.nvidia.com/en-us/ai/cosmos/.
“Robotaxis Are Learning to Drive in an AI-Simulated World.” Axios, 2026, https://www.axios.com/2026/02/25/ai-waymo-robotaxis-av. Wu, Philipp, et al.
“DayDreamer: World Models for Physical Robot Learning.” arXiv, 2022, https://arxiv.org/abs/2206.14176.

#ArtificialIntelligence #WorldModels #FoundationModels #PhysicalAI #AGI#DeepLearning #RobotLearning #AIInRobotics #SimulatedWorld #AutonomousSystems#NVIDIA #GoogleGenie #CosmosAI #DayDreamerAI #Robotaxis#TechTrends #FutureOfAI #AIInnovation #AIResearch

Information