Every major humanoid robot company is trying to solve roughly the same problem: build a machine that can perceive its environment, understand a task, and execute it with human-like dexterity. The bodies they build — the joints, the actuators, the sensors — are converging toward similar form factors. But the brains inside those bodies could not be more different.
We are at an extraordinarily early moment. The humanoid robot market is, for practical purposes, a virgin territory. No dominant standard exists. No single approach has proven itself at scale. Each major laboratory has made a deep bet on a fundamentally different philosophy of machine intelligence — and the diversity of those bets is not a sign of confusion. It is the natural state of a field where nobody yet knows the right answer.
That will not last. History suggests that when a technology market matures, commercial success drives convergence: the companies that sell in volume, deploy at scale and prove utility in the real world will start setting the standard. The rest will pivot toward the winning approach — or be left behind. It happened with operating systems, with smartphones, with cloud infrastructure. It will happen here too. The question is which AI philosophy will prove itself first.
Here is where each of the main contenders stands today.
Eight Systems at a Glance
| System | Architecture type | Key strength | Key weakness | Open / Closed | Commercial status (2026) |
|---|---|---|---|---|---|
| Helix (Figure) | VLA end-to-end | Speed, 8-hour autonomous shifts | Black box, hard to audit | Closed | Factory deployment |
| Carbon (Sanctuary) | Layered cognitive architecture | Explainability, task decomposition | Slower, higher compute cost | Closed | Enterprise pilots |
| GR00T (NVIDIA) | Foundation model + sim-to-real | Multi-manufacturer ecosystem | Dependent on simulation fidelity | Licensed | Multi-OEM integration |
| π0 (Phys. Intelligence) | Diffusion policy VLA | Dexterous manipulation | Weaker on bipedal locomotion | Closed (research) | Research / early licensing |
| Tesla Optimus AI | FSD neural stack + Dojo | Unmatched real-world data volume | Driving ↔ manipulation transfer unproven | Closed | Internal factory use |
| BD Atlas AI | Classical control + RL overlay | Unstructured terrain reliability | Conservative, slower to adapt | Closed | Commercial (enterprise) |
| VLT (XPENG Iron) | VLA + XNGP autonomous driving stack | Proven ADAS base, open SDK | Mass production track record unproven | SDK open | Production from April 2026 |
| UnifoLM (Unitree) | VLA + open platform / app store | Ecosystem, price, volume | Less vertically integrated AI | Open (VLA-0 open source) | Highest-volume commercial |
Helix (Figure AI): The Brain That Does Not Think — It Acts
Figure AI calls their control system Helix, and the name fits. It is a single, tightly wound neural architecture that takes raw camera images and natural language instructions as input and outputs motor torques directly. No perception module. No planning module. No separate reasoning layer. One network, pixels in, actions out.
This approach — known as a Vision-Language-Action model, or VLA — eliminates the latency that would otherwise accumulate across a pipeline of separate subsystems. There is no handoff from perception to planning to execution. The reaction is as close to instantaneous as a neural network can get. In May 2026, Figure's robots completed full eight-hour autonomous factory shifts — the longest sustained autonomous operation any humanoid has demonstrated to date — running Helix throughout.
The trade-off is opacity. A VLA that goes directly from pixels to torques is, by construction, a black box. When it fails, diagnosing why it failed is genuinely hard. There is no planning layer to inspect, no symbolic representation of what the robot "thought" it was supposed to do. For industrial deployments where explainability and auditability matter, this is a real constraint. For raw performance in well-defined environments, Helix is currently among the most impressive systems running in the field.
Carbon (Sanctuary AI): The Machine That Asks Why
Sanctuary AI's approach is almost the philosophical inverse of Helix. Their system, named Carbon, is a layered cognitive architecture inspired by how human reasoning actually works: a hierarchy of processes that moves from understanding intent, to decomposing a task into sub-goals, to planning a sequence of actions, to executing individual motor commands.
Carbon can, in principle, explain each of its decisions. It maintains an explicit representation of what it is trying to accomplish and why, which means an operator can inspect its reasoning, identify where it went wrong, and correct it at the appropriate level of abstraction rather than retraining the entire network. Sanctuary has publicly claimed that Carbon can learn to automate a new task in under 24 hours of exposure.
The cost is speed and computational overhead. Cognitive architectures with explicit reasoning layers are slower than end-to-end neural approaches, and more expensive to run. They also require more careful engineering: every layer of the hierarchy needs to be designed, maintained and updated as the robot encounters new task categories. The payoff, however, is a system that is inherently more legible — and legibility may prove to be exactly what enterprise customers and regulators demand as humanoids move into safety-critical environments.
GR00T (NVIDIA): The Platform That Wants to Be Android
NVIDIA is not building a humanoid robot. NVIDIA is building the operating system for humanoid robots. Their GR00T foundation model — the name stands for Generalist Robot 00 Technology — is designed to be adopted by other manufacturers: Agility Robotics, Fourier Intelligence, Apptronik and others have all integrated it or announced intent to do so.
The GR00T philosophy is built on NVIDIA's core strength: simulation at scale. The model is trained massively in Isaac Sim, NVIDIA's GPU-accelerated physics engine, and then transferred to real hardware through the same sim-to-real techniques we described in an earlier piece on simulation training. The bet is that by training on an almost unlimited supply of synthetic data — varied terrain, varied objects, varied physics parameters — GR00T can develop general capabilities that transfer to physical robots across different hardware configurations.
This is a long-game strategy. NVIDIA is not racing to ship the best robot by 2027. It is racing to become the infrastructure layer that every robot manufacturer depends on — the way virtually every smartphone manufacturer once depended on Android. If they succeed, the value accrues to NVIDIA regardless of which hardware company wins.
π0 (Physical Intelligence): When Hands Learn to Think
Physical Intelligence — the research lab, not the concept — has built their system, called π0 (pi-zero), around a technique called diffusion policy. Instead of training a network to output actions directly, π0 trains a model to iteratively refine a noisy action trajectory into a precise motor sequence — the same mathematical framework that powers modern image generation, applied to robot control.
The result is a system with unusually fine-grained manipulation capability. π0 has demonstrated robots folding laundry, cooking simple meals, and handling objects with the kind of dexterous precision that VLA end-to-end approaches tend to struggle with. The denoising process allows the model to represent multi-modal distributions of possible actions — useful when a task has multiple valid solutions — rather than collapsing to a single deterministic output.
The current limitation is locomotion. π0 is exceptionally strong at what the hands are doing; it is less mature as a foundation for the kind of robust bipedal movement that a fully general humanoid requires. As a specialist in manipulation, however, it may be the most technically sophisticated system in its category.
Tesla Optimus: The Data Flywheel Made Physical
Tesla has not given its robot AI a branded name, which is itself a statement. For Tesla, the AI powering Optimus is not a separate product — it is an extension of the same neural network stack that runs in every Tesla vehicle on the road. The strategy has a name in the industry: the data flywheel.
Over six million Tesla cars are continuously collecting real-world sensor data — images, distances, predictions, outcomes — that feeds back into training on the Dojo supercomputer. No other robotics company on earth has access to that volume of real-world visual experience. The hypothesis is that a model trained on millions of kilometres of human-scale visual data will develop priors about the physical world — how objects move, how spaces are organised, how tasks unfold in time — that transfer meaningfully to a robot operating in similar environments.
The honest uncertainty is whether driving perception transfers to manipulation. Driving and object handling are different problems with different geometries, different timescales and different types of physical interaction. Tesla's bet is that the overlap is large enough to matter. Given the scale of their data advantage, they may be right even if the overlap is only partial.
Boston Dynamics Atlas: The School That Learned to Learn
Boston Dynamics has spent three decades building robots using classical control: precise mathematical models of robot dynamics, optimisation algorithms that plan trajectories in real time, careful hand-engineering of every movement. The results were undeniably impressive — Atlas's parkour and gymnastics demonstrations remain benchmarks against which other humanoids are measured — but the approach scaled poorly to new tasks and new environments.
The new electric Atlas, introduced in 2024, begins to layer machine learning on top of this foundation. Reinforcement learning policies handle specific locomotion challenges; the classical model-based controller provides the stability backbone. Boston Dynamics is not abandoning what made it exceptional. It is adding a new capability layer on top of hardware and control theory that no other company has had as long to develop.
This makes Atlas the most reliable robot in unstructured terrain — the one you would trust in a genuinely unpredictable physical environment — while also making it the most conservative in adopting new AI approaches. For some use cases, that conservatism is exactly the right call.
VLT (XPENG Iron): Autonomous Driving DNA in a Human Body
XPENG built its reputation as one of China's most technically serious electric vehicle companies, with an autonomous driving stack called XNGP that competes directly with Tesla's FSD. When XPENG decided to build a humanoid robot — the Iron, revealed in 2025 — the logical move was to transfer that stack rather than build from scratch.
Their robot AI is called VLT (Vision-Language-Task), complemented by a VLA 2.0 layer adapted directly from XNGP. The system inherits the vehicle's sensor array — LiDAR, stereo cameras, depth sensors — repurposed for a robot operating at human scale. XPENG claims 720-degree spatial awareness and 2,250 TOPS of onboard compute from three custom Turing AI chips.
What makes Iron editorially interesting is its philosophy of "extreme anthropomorphism": a flexible vertebral spine, biometric actuators designed to mimic human musculature, and full-coverage synthetic skin with embedded touch sensors. XPENG CEO He Xiaopeng has described the ambition as giving Iron "practical jobs" first and letting an open SDK handle expanding capabilities. Mass production was targeted for April 2026, with aspirations toward one million units annually by 2030 — numbers that, if achieved, would make Iron the highest-volume humanoid in history by a significant margin.
UnifoLM (Unitree): The App Store for Robot Skills
Unitree is, by almost every commercial metric, the most successful humanoid robot company operating today. Over 5,500 units shipped in 2025. The most published-on commercial humanoid in peer-reviewed literature. The H1 holds the world record for bipedal running speed at 3.3 metres per second. The R1 was named a TIME Magazine Best Invention of 2025. And none of it is driven by a single proprietary AI system — it is driven by a deliberate openness philosophy.
Their AI foundation is called UnifoLM, a multimodal Vision-Language-Action model that runs onboard all Unitree platforms. In March 2026, they open-sourced UnifoLM-VLA-0, built on the Qwen2.5-VL-7B foundation, providing a manipulation policy baseline for twelve task categories that any developer can extend. More importantly, they built UniStore: a marketplace where developers can publish and sell motion packages and task applications for Unitree robots, installable with a single tap.
The philosophy is unmistakably that of a platform business. Unitree is not trying to build the most sophisticated AI brain — it is trying to build the hardware and infrastructure that the most sophisticated AI brains run on. If you want to see what skills are already available across the Unitree ecosystem, our robot skills catalogue tracks the growing library of capabilities that third-party developers have published. The analogy to the early smartphone app store is not accidental. It is the explicit model.
Platform businesses tend to win in technology markets not because their core product is the best, but because they create ecosystems that are too valuable to abandon. If Unitree's open model attracts enough developers building enough skills, the value of the platform compounds independently of any single hardware or AI improvement. This is structurally different from every other approach in this comparison.
Nobody Knows Who Wins — And That Is Precisely the Point
If you have read this far looking for a verdict, here is the honest answer: there is none yet. None of these eight approaches has been validated at the scale and in the variety of environments that would prove it the right general solution. The market is too new, the deployments too limited, the task diversity too narrow for any definitive conclusions.
What we can say is this: the approach that wins will probably win for commercial reasons before it wins for technical ones. The company that first demonstrates sustained, cost-effective utility in a real enterprise environment — not a demo, not a pilot, but repeatable deployed value — will attract customers, then capital, then talent. Other manufacturers will study what made it work and incorporate those lessons. The technical diversity we see today will narrow, not because someone proved their approach correct in a laboratory, but because someone proved it correct in a warehouse, a hospital, or a factory.
What is clear today is something more important: this is one of those rare moments in technology where genuinely brilliant engineers, with genuinely different ideas, are competing in earnest for a genuinely enormous prize. End-to-end VLA that thinks in milliseconds. Cognitive architectures that reason like minds. Foundation models that dream in simulation. Diffusion policies that feel their way to precision. Data flywheels a decade in the making. Classical control that learned to learn. An EV giant turning its autonomous driving DNA into a human body. An open platform treating robot skills like smartphone apps.
We don't yet know which of these ideas will define the age of humanoid robots. But it is genuinely thrilling to watch brilliant people fight for it.