Foundations · 1 of 5

Von Neumann vs neural: two ways to compute

For 64 years, computing meant a CPU fetching instructions from memory and executing them one at a time. Neural networks are a different machine. The shift matters because the software model is what changes, not just the hardware.

Where the binding constraint sits today

The von Neumann bottleneck is the round trip between processor and memory. The neural-network workload converts that bottleneck into a fundamentally different system: weights as state, matrices as instructions, tokens as output.

The von Neumann machine

A von Neumann computer holds program instructions and data in the same memory. The CPU fetches an instruction, decodes it, executes it, then fetches the next one. Storage and processing live in different places, connected by a bus. The serial loop is the basic unit of work.

This is the architecture behind every general-purpose computer from the IBM System/360 in 1964 through the laptop you are reading this on. Software is precalculated: the developer writes code, compiles it to a binary, ships the binary, and the user runs it. The instructions are fixed at compile time. The machine just executes.

The bottleneck this design creates

The CPU is fast. Memory access is slow. Every instruction means a trip across the bus, and that trip dominates real-world performance. Decades of CPU design — branch prediction, caches, out-of-order execution, speculative execution — are workarounds for this single problem.

For workloads with messy control flow (a database, a web server, an operating system kernel), the von Neumann design is the right tool. The problem is that the most economically important workload of the 2020s is not messy control flow.

The neural-network machine

A neural network is a long chain of the same operation: multiply a matrix of inputs by a matrix of learned weights, apply a non-linearity, repeat. The "program" is the weight matrix. The "instruction" is the matmul. There is no branching, no speculative execution, no per-instruction memory fetch.

This workload wants the opposite of a CPU. It wants thousands of arithmetic units running in lockstep, fed by memory that streams in bulk rather than trickled through caches. That is a GPU, or a TPU, or any modern AI accelerator: a wide tensor factory rather than a serial executor.

Why this is "computing reinvented"

In a von Neumann world, code is shipped. The user interacts with a frozen artifact. In a neural-network world, the model generates the output in real time, contextually aware, responsive to intent rather than literal instructions. The same model can write code, summarize a meeting, or drive a car depending on what is asked.

Jensen Huang, in his Stanford CS153 lecture, called this the first reinvention of computing in 64 years. The shift is not just architectural. It changes what software is, how it is built, how it is distributed, and what business models are possible on top of it.

Source: Jensen Huang, Stanford CS153 Frontier Systems lecture, April 30, 2026 (https://cs153.stanford.edu/)