Peregrinations - Compute | Peregrinations

Compute

Three pillars of any computer: storage, networking, and calculation. Built up from PN diodes and band gaps to wafer-scale accelerators, HBM stacks, and rack-level fabrics.

Most recent

When the Workload Stops Moving: Why Custom Silicon Finally Pays

Margins and Nvidia-avoidance explain the motive. They don't explain the timing — why every hyperscaler stood up a chip program inside the same two years, or why a cheaper chip doesn't always buy a cheaper token.

9 minRead

Foundations

Ch. 01

Von Neumann vs neural: two ways to compute

For 64 years, computing meant a CPU fetching instructions from memory and executing them one at a time.

When the Workload Stops Moving: Why Custom Silicon Finally Pays

Foundations

Von Neumann vs neural: two ways to compute

RISC vs CISC and the birth of codesign

What is a CPU? From desktop to cloud to AI agent

Codesign: the structural moat behind the million-fold speedup

Metrics that matter: FLOPS, MFU, tokens-per-watt, tokens-per-dollar

How inference actually happens: prefill vs decode

Chip generations shaped by bottlenecks: Hopper, Blackwell, Vera, Feynman

Metrics that matter (compute)

FLOPS — necessary, not sufficient

MFU — model FLOPS utilisation

Tokens-per-watt — the inference unit metric

Tokens-per-dollar — the buyer metric

Memory bandwidth and capacity — the four-D bottleneck

Visual essay

Electrons to tokens

Calculation, in nine questions

Why does AI run on GPUs?

What's inside a modern AI accelerator?

Why does memory bandwidth matter more than FLOPS?

Why does so much AI chip supply run through Taiwan?

What does a new chip generation actually buy?

Which AI chip should I care about?

Why did wafer-scale chips take 40 years to ship?

What is yield, and why does it set the price of compute?

Why lithography is the chokepoint on AI chips

How the industry pushed transistors smaller for 30 years

How ASML's EUV scanner became the bottleneck on AI

Two startups betting they can crack EUV cost

Why is the CPU:GPU ratio rising in AI infrastructure?

Why are we putting TPUs in Sun-Synchronous Orbits?

Storage, in seven layers

How does memory hierarchy work in an AI chip?

What is SRAM, and why does an AI accelerator have so little of it?

Why do only three companies make DRAM?

What is LPDDR5X, and why is it in a data center?

What is HBM, and why does it set the price of a token?

How can NVIDIA halve a rack's memory after the chip is designed?

What is NAND flash, and where does it sit in the AI Stack?

Where do hard drives still matter in AI infrastructure?

Networking and racks

What is AI infrastructure?

Why does networking become the next bottleneck?

Why is the rack the new computer?

Why is cooling an architecture choice?

Where should AI compute live?

What do hyperscalers buy when they buy compute?

What happens when a cluster mixes chip generations?

Why is indium phosphide the bottleneck behind co-packaged optics?

Semiconductors, in twelve questions

Why is everything made of silicon?

What is a transistor, and why was it the most important invention of the 20th century?

Why is it called Silicon Valley?

How did one company become two thousand?

Why "doubling every two years"?

Why did Intel dominate, then lose?

Why did Apple build its own chips?

How did Nvidia almost die — and what did they bet on?

What does analog do that digital can't?

What runs the power, not the math?

How is a chip actually made?

What does a modern AI chip look like, end-to-end?

Why does the package, not the die, decide how big an AI chip can get?

When the Workload Stops Moving: Why Custom Silicon Finally Pays

Foundations

Von Neumann vs neural: two ways to compute

RISC vs CISC and the birth of codesign

What is a CPU? From desktop to cloud to AI agent

Codesign: the structural moat behind the million-fold speedup

Metrics that matter: FLOPS, MFU, tokens-per-watt, tokens-per-dollar

How inference actually happens: prefill vs decode

Chip generations shaped by bottlenecks: Hopper, Blackwell, Vera, Feynman

Metrics that matter (compute)

FLOPS — necessary, not sufficient

MFU — model FLOPS utilisation

Tokens-per-watt — the inference unit metric

Tokens-per-dollar — the buyer metric

Memory bandwidth and capacity — the four-D bottleneck

Visual essay