Physical Data Manifesto
Unlocking Robotic Foundation Models
The history of artificial intelligence is the history of data. Language models only became powerful when we gave them the full corpus of the internet. Vision systems only began to generalize when ImageNet and its successors arrived. Again and again, the story repeats: algorithms are downstream of data.
Robotics is no different. The frontier is not held back by cleverness, but by experience. The physical world is messy, noisy, and stubbornly infinite. Robots fail not because we lack models, but because we starve those models of the physical data they require.
We believe that the single largest bottleneck in unlocking robotic foundation models is data—physical interaction data, sensorimotor trajectories, the long tail of mistakes, the trillions of edge cases that no simulator can conjure. Without it, robotic foundation models are castles built on sand. With it, the path to general-purpose physical intelligence opens.
Our Thesis
- Data Before Research. Once data exists at scale, research accelerates. The breakthroughs in language AI were not acts of genius alone, but acts of scale. The same will be true for robotics: data density first, model elegance second.
- Physical Data Is Scarce. Unlike text scraped from the web, physical interaction data requires real robots, in real environments, under real constraints. Every second of robot operation is expensive. Every failure risks hardware. Every dataset is fragmented, proprietary, and too small.
- Simulation Is Necessary But Insufficient. Sim-to-real transfer can get us halfway. But the last mile—dexterity, safety, robustness—requires exposure to the friction, slippage, wear, and chaos of the real world. Simulation is an amplifier, not a substitute.
- A Shared Corpus Unlocks Everything. Imagine a "Common Crawl of the physical world": billions of trajectories, diverse robots, diverse tasks, open protocols. This is not a luxury—it is the necessary substrate for robotic foundation models.
Our Mission
Physical Data exists to build the largest, richest, and most accessible corpus of robotic data in the world.
We will:
- Operate fleets of robots across environments to collect continuous streams of multimodal data.
- Partner with hardware manufacturers, labs, and companies to aggregate fragmented datasets into a unified corpus.
- Develop standards and APIs so data is composable, searchable, and usable for foundation model training.
- Layer simulation on top of real-world data, not as a replacement but as an engine of diversity and augmentation.
This is not just about storage. It is about curation, coverage, and scale. We believe data is infrastructure. Just as roads enabled commerce and satellites enabled navigation, robotic data will enable embodied intelligence.
Why Now
- The cost of training robotic foundation models is exploding, but their hunger for data is insatiable. Without solving the data bottleneck, investment will be wasted on underfed models.
- Hardware is maturing—sensors, actuators, and edge compute are finally cheap enough to scale robotic fleets.
- Capital is available. Investors have seen what foundation models did for language and vision; they know robotics is next.
The window is open. If we do not build Physical Data now, the industry will stumble, fragmented and undernourished.
Our Belief
We believe in a future where robots are not brittle tools, but general-purpose companions in factories, warehouses, homes, hospitals, and beyond. But to get there, we must teach them through experience. Data is experience.
Robotic foundation models are inevitable. Physical Data exists to make them possible.
Call to Action
To researchers: join us in shaping the backbone of embodied AI.
To partners: share your data, contribute your fleets, become part of a common corpus.
To investors: this is the ultimate pick-and-shovel—without data, no robotic intelligence can stand.
To dreamers: imagine a world where robots learn as quickly and broadly as humans. This begins here.
Physical Data is the substrate of physical intelligence.