Back to Chips
Chips · 2 of 6

What's inside a modern AI accelerator?

The useful unit is no longer a lone die. It is compute, memory, packaging, power delivery, and fabric packed tightly enough that software can pretend the whole thing is one machine.

Where the binding constraint sits today

Frontier accelerators are constrained by the weakest physical neighbor around the compute die: HBM, package substrate, voltage delivery, cooling, or scale-up fabric.

The die is only the center

The compute die holds tensor cores, caches, control logic, and the on-chip network. It is where matrix multiplication happens, but it is not where the whole system is won.

A useful accelerator also needs high-bandwidth memory next to it, an interposer or advanced package under it, power stages around it, and links out to other accelerators. Each part can become the bottleneck before the math units do.

HBM is the second half of the chip

High-bandwidth memory sits beside the compute die because AI models spend their lives reading weights and activations. Moving data from ordinary memory would waste the expensive math units waiting for bytes.

That is why the HBM line on a datasheet matters as much as peak FLOPS. A chip with more math but the same memory bandwidth may look better in a launch slide and produce the same token rate in decode.

Packaging is where supply gets real

Advanced packaging is the physical act of placing compute dies and memory stacks close enough to behave like one device. TSMC CoWoS is the named constraint because it is where many leading accelerators become shippable systems.

A wafer that cannot be packaged does not become deployed compute. This is why chip supply discussions keep moving from fabs to interposers, substrates, HBM stacks, and packaging capacity.

Scale-up fabric turns chips into a training unit

Frontier models are too large for one accelerator. Scale-up links like NVLink, ICI, and Infinity Fabric try to make many chips act like one low-latency machine before traffic spills onto slower data-center networks.

The important number is not just per-chip throughput. It is the size and bandwidth of the scale-up domain. That boundary decides which model shapes fit cleanly and which ones pay the network tax.

72
NVIDIA NVL72 scale-up domain in the local comparator
9,216
TPU v7p scale-up domain in the local comparator

The product is the package plus the rack

The frontier accelerator has become a rack-scale product. Buyers are not just buying chips. They are buying power envelopes, cooling assumptions, networking topology, firmware, compilers, and a software stack that can keep utilization high.

That is the non-obvious point: the chip layer is already halfway into the infrastructure layer. The winning accelerator is the one that arrives as a deployable system.