Loading
Loading
Three pillars of any computer: storage, networking, and calculation. Built up from PN diodes and band gaps to wafer-scale accelerators, HBM stacks, and rack-level fabrics.
Margins and Nvidia-avoidance explain the motive. They don't explain the timing — why every hyperscaler stood up a chip program inside the same two years, or why a cheaper chip doesn't always buy a cheaper token.
For 64 years, computing meant a CPU fetching instructions from memory and executing them one at a time.
A simpler instruction set, codesigned with the compiler, beat the best individually-optimised chips.
A CPU has done three very different jobs in 40 years.
When chip, compiler, network, and model are designed against each other rather than in isolation, the system compounds.
Pick the wrong metric and you optimise the wrong thing.
A model serving a request does two very different things.
Each NVIDIA generation was designed to break the bottleneck the previous one exposed.
The headline accelerator number. Predicts pretraining throughput. Rarely predicts real-world inference cost.
Fraction of advertised FLOPS the workload actually consumes. Often misleading on its own. Jensen wants low MFU because it means the system is overprovisioned.
The number that bundles silicon, memory bandwidth, and rack architecture into a single economic signal.
What an end customer ultimately pays for. 5x range across the same chip depending on deployment shape.
FLOPS, memory bandwidth, memory capacity, network. At any moment, one of these is the constraint. The right metric exposes which.
AI workloads reward wide, repetitive math more than clever serial control.
The useful unit is no longer a lone die. It is compute, memory, packaging, power delivery, and fabric packed tightly enough that software can pretend the whole thing is one machine.
FLOPS measure how fast a chip can do math. Many AI workloads are waiting on the weights, cache, and activations to arrive.
The frontier chip supply chain is geographically concentrated because the hardest steps compound in one place: leading-edge fabrication, advanced packaging, and supplier learning loops.
A new generation is not one miracle. It is a bundle of smaller moves across node, package, memory, precision, dataflow, networking, software, and power.
The answer depends on the job: frontier training, cheap inference, long context, software portability, supply assurance, or power-limited deployment.
A modern AI accelerator is a square cut out of a round silicon wafer.
Yield is the fraction of chips that come off the line working.
One machine prints every leading-edge AI chip on Earth.
Before EUV, lithography meant shrinking the wavelength of deep-ultraviolet excimer lasers.
Every leading-edge AI chip in production passes through a machine that costs $200 to $400 million, weighs 180 tons, and is built by a single Dutch company.
xLight wants to replace ASML's tin-droplet plasma with a free-electron laser and sell photons by the gallon.
AI training was a GPU story. Production inference, reinforcement learning, and agentic workflows are CPU stories layered on top of GPU kernels. Capacity planning has to model both.
Project Suncatcher, vacuum lasers, and the physics of orbital data centers.
Every level of memory trades capacity for latency.
SRAM is the fastest memory on a chip and the most expensive per bit. Every accelerator design is a fight over how much of the die to spend on it.
DRAM is dense, cheap to print, and impossible to repurpose a logic fab into. The world has SK Hynix, Samsung, and Micron, and they are sold out.
LPDDR is the low-power DRAM built for phones.
High-bandwidth memory stacks DRAM dies vertically over a logic base die and connects them to the compute with a very wide bus.
The Vera CPU swaps soldered LPDDR for socketed SOCAMM modules, turning system memory into a configuration lever.
NAND flash is non-volatile storage built from charge-trapping transistors stacked vertically.
Hard drives sit at the bottom of the storage hierarchy: slowest per access, cheapest per terabyte, and where the cold corpus of a frontier lab actually lives.
AI infrastructure is the layer that turns chips and power into usable capacity: buildings, racks, networks, cooling, schedulers, and operating discipline.
When models outgrow one accelerator, the training run becomes a communication problem wrapped around a math problem.
The unit of AI deployment has moved from server to rack because power, cooling, memory, and fabric now have to be designed together.
Cooling is not the equipment you add after compute. It sets rack density, site design, maintenance model, and the kinds of chips the building can host.
The best site is a compromise among power, fiber, latency, cooling, regulation, tax treatment, labor, and the workload that will run there.
They are buying an option on future capability: energized land, racks, chips, networks, software, depreciation curves, and priority in every constrained supply chain.
Heterogeneous training is harder than the GPU count suggests.
CPO moves optical conversion from a pluggable module onto the same substrate as the switch ASIC.
A conductor cannot be switched off. An insulator cannot be switched on. Silicon sits in the goldilocks zone where a small applied voltage can flip the material between the two — and is plentiful enough that civilization can afford to use it for everything.
A transistor is a switch with no moving parts that can also amplify.
A short walk from Stanford in 1957, eight men quit one company on the same day and started another.
Fairchild Semiconductor was the most productive technology employer in history — but it was Fairchild as a talent pump, not Fairchild as a product company.
Gordon Moore wrote a four-page magazine article in 1965 noticing that the number of components on an integrated circuit had been doubling annually.
Intel was the most valuable semiconductor company in the world for two decades.
Apple spent fifteen years on PowerPC, fifteen on Intel, and is now in year five of Apple Silicon.
In 1996 Nvidia had thirty days of cash and a product nobody wanted.
Texas Instruments, Analog Devices, NXP, Infineon, and a handful of others run a parallel semiconductor industry that nobody talks about because it is unfashionable, durable, and embarrassingly profitable.
Power semiconductors switch high voltages and currents — kilovolts, hundreds of amps — without burning up.
Turning a cylinder of pure silicon into a working die is a thousand-step relay run over about twelve weeks — and at each hard step, a two-to-four-supplier oligopoly holds the choke point.
The capstone. We walk through one real product — NVIDIA's GB200 Grace-Blackwell superchip and the NVL72 rack it ships in — and name every major component, where it was designed, where it was fabbed, where it was packaged, who supplies it, and what could disrupt it. By the end the reader should be able to read any chip datasheet and place every line in its actual supply chain.
A frontier accelerator stopped being one chip years ago.