Digital AI foundations · 3 of 4

Domain representations: language is one example, not the only one

A language model works because it learns a representation of words, characters, and syntax. The same machine works for chemistry, robotics, biology, and physics — once you choose the right representation for that domain. The frontier is multidomain foundation models.

What a representation is

A language model represents text as tokens: numerical IDs for sub-word units, punctuation, and special markers. The model learns that "cat" is close to "kitten" and far from "spreadsheet" in some learned 4,096-dimensional space. The representation is what makes language compressible enough that a neural network can generalise across it.

Every domain that has structure has a representation. The question is whether you have found it.

Chemistry has a different primitive

In chemistry, the tokens are not words. They are atoms, bonds, functional groups, and reaction templates. A chemistry foundation model represents molecules in a learned space where "benzene" is close to "toluene" and far from "glucose." Once you have that representation, you can generate plausible new molecules, predict reactions, and screen drug candidates at scale.

NVIDIA's BioNemo is one example. Major pharma houses are building their own. The bet is the same as the bet that language-model scale would deliver capabilities: enough data plus enough compute on the right representation produces a generalisable function.

Robotics has another

In robotics, the tokens are poses, joint angles, sensor readings, and motor commands. A robotics foundation model represents the world in a learned space where similar states map to similar actions. NVIDIA's Groot is one example for humanoids. Physical Intelligence, 1X, Figure, and Tesla are building their own.

The reason robotics did not work for decades is that classical control theory tried to handcraft the representation. Foundation models learn it from data — millions of demonstrations, simulated rollouts, real-world episodes. The representation is the work product.

Why language models still help across domains

A pure chemistry model knows nothing about cancer treatment goals. A pure robotics model knows nothing about a human asking it to make breakfast. The frontier architecture is a domain-specific representation model fused with a language model that supplies human priors: "what would a person want here?"

NVIDIA's Alpamayo is the cleanest worked example. It is a language model fused with a world model, deployed in autonomous vehicles. Because the language model carries human priors about driving, the system needs a few million miles of training rather than the billions a pure-data approach would require. The fusion is the data-efficiency mechanism.

Source: Jensen Huang, Stanford CS153 Frontier Systems lecture, April 30, 2026 (https://cs153.stanford.edu/)

How to read the foundation-model landscape

When a lab announces a new foundation model, the question to ask is: what is the representation? Language models represent text. Multimodal models represent text + images. World models represent physical state and dynamics. Chemistry models represent molecules. Biology models represent proteins and genes. Each representation choice opens a different set of downstream capabilities.

The companies that find a representation no one else has, and the data to train against it, hold the same kind of position OpenAI held in language in 2020. The bet on what representation matters next is the bet on what AI will be useful for next.