Semiconductors · 8 of 12

How did Nvidia almost die, and what did they bet on?

In 1996, Nvidia had thirty days of cash and a product nobody wanted. They survived by making one good chip (the RIVA 128) and one good business pivot (graphics for gamers, not workstations). A decade later they made a much harder bet: CUDA, a programming model for using GPUs as general compute, when nobody was asking for it. The bet took fifteen years to pay off. By 2024 it had made Nvidia the most valuable company in the world.

Where the binding constraint sits today

Some option bets only pay off if the company keeps funding them through years of looking foolish. The strategic question is which conditions make a firm capable of holding that conviction, and what the AI-era equivalents look like today, while the next round of similar bets is still in its uncomfortable phase.

Founding and near-death

Nvidia was founded in April 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem at a Denny's in San Jose. Huang was 30 years old, had been at LSI Logic for eight years, and brought the business plan. Malachowsky and Priem had worked together at Sun Microsystems on the GX graphics architecture. The three put in $40,000 each and raised $20 million from Sequoia and Sutter Hill, partly on the strength of Don Valentine of Sequoia telling Huang on the phone, 'If you lose my money, I'll kill you.'

The first chip, the NV1, shipped in 1995. It was technically ambitious — it used quadratic surfaces instead of the polygons everyone else was settling on — and a commercial failure. The polygon-based API that Microsoft was about to standardize on (Direct3D, released 1996) was incompatible with the NV1's approach. The NV2 was cancelled. By mid-1996 Nvidia was running on fumes, had laid off half its staff, and was reportedly six months from bankruptcy.

The rescue product was the RIVA 128, shipped in August 1997. It was a more conventional polygon-based 3D accelerator, built on a new architecture, and aimed at the consumer PC gaming market — not the workstation market where Nvidia's earlier products had failed. The chip sold well, gave Nvidia eighteen months of survival cash, and was followed by the RIVA TNT (1998), the GeForce 256 (1999, the first chip Nvidia called a 'GPU'), and the GeForce 2 (2000). Each generation took more market share. By 2002 Nvidia was a public company with $1.4 billion in revenue and the dominant supplier of consumer 3D graphics.

Source: Jensen Huang's various retellings, most reliably his 2023 Stanford GSB "View From The Top" talk and the 2024 *Acquired* podcast episode on Nvidia.

The CUDA bet

By 2005, Nvidia GPUs were the most parallel processors in any consumer device. A high-end GeForce had hundreds of small floating-point units running simultaneously to render pixels. A few academics — at Stanford, at the University of Illinois, at several physics labs — had started writing scientific code that exploited this parallelism, using awkward graphics APIs to run protein-folding simulations and fluid dynamics on consumer graphics cards. The performance was real: 10x to 100x speedups over CPUs on the workloads that fit the parallel structure.

Nvidia's response, beginning around 2004 and shipping in 2006-07, was a piece of software called CUDA (Compute Unified Device Architecture) plus a hardware redesign (the Tesla architecture, also 2006, no relation to the car company) that made the GPU's compute units explicitly usable for non-graphics work. CUDA exposed the GPU as a parallel processor that could be programmed in something resembling C, rather than as a graphics pipeline that had to be tricked into doing math. Every Nvidia GPU shipped after 2007 included CUDA support, including consumer cards. Every PhD student doing simulation work could now use a $400 graphics card from Best Buy to run a workload that previously required a $50,000 cluster.

The strategic case inside Nvidia at the time was uncomfortable. CUDA was not a revenue line. Nvidia gave it away for free. The Tesla-line chips for serious compute customers were a small business, in single-digit millions of dollars of annual revenue for several years. The R&D investment to support CUDA — compilers, libraries, documentation, developer relations, a parallel programming research team — was significant and ongoing. For most of the period from 2007 to 2014, CUDA cost Nvidia more than it brought in. Huang kept funding it anyway, on the thesis that parallel programming was the future of high-performance compute and that whoever owned the leading parallel programming environment would own the resulting business.

2006

CUDA first released

2007

First Tesla compute card

~$50M

Nvidia datacenter revenue ~2010 (rough)

~$47B

Nvidia datacenter revenue, FY2024

AlexNet, 2012 — the option starts to vest

In September 2012, three researchers — Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto — submitted a neural network called AlexNet to the ImageNet image-classification competition. AlexNet won by a margin that startled the field: 15.3% top-5 error against the next-best entry at 26.2%. The technique they used (a deep convolutional neural network trained with backpropagation) had been around for decades, but two things had changed. ImageNet had provided a large enough labeled training set. And Krizhevsky had trained the network on two consumer Nvidia GTX 580 GPUs in his bedroom over a week — a compute scale that would have been impractical even five years earlier.

The ImageNet result was the moment GPUs stopped being 'graphics chips that could occasionally do science' and became 'the substrate of deep learning.' Within two years, every serious computer-vision research group was using Nvidia hardware, almost always with CUDA. By 2015, Google had published TensorFlow (which ran on CUDA first, TPU second). By 2017 the transformer paper had landed at Google Brain (training on TPUs, but most of the research field replicating on Nvidia). By 2020 GPT-3 had been trained on roughly 10,000 Nvidia V100s.

The fifteen-year CUDA investment had been built up exactly to be ready for this. When the workload arrived, the libraries existed, the tooling worked, and the entire deep-learning research community had already trained itself on Nvidia hardware. There was no alternative path. Intel had MKL-DNN, AMD had ROCm, Google had TPU+JAX — but the broader ecosystem (PhDs, libraries, papers, tutorials, GitHub repositories) ran on CUDA, and the network effects of that compound made it nearly impossible for a competitor to catch up.

The 2022-2024 inflection

From 2015 to 2022, Nvidia's datacenter revenue grew steadily but unspectacularly — from about $300M to about $15B over seven years. The crypto-mining boom briefly inflated GPU demand in 2017 and 2021 but did not establish a durable customer base. The real inflection was generative AI.

ChatGPT launched in November 2022. By the second quarter of 2023, Nvidia's datacenter revenue had roughly doubled year-over-year. By Q3 2023 it was three times the prior year. By Q1 2024 it was five times. The H100, launched in 2022 as the successor to the A100, became the most demanded single product in the technology industry. Lead times stretched to 12 months. The chip's contribution margin was reported to be over 80% on a price north of $30,000 per unit. Nvidia's stock price went from roughly $150 in October 2022 to over $1,000 in February 2024 (pre-split). Market capitalization briefly exceeded Apple's, making Nvidia the most valuable company in the world.

What had taken fifteen years to set up paid back in eighteen months. The CUDA ecosystem, the parallel hardware roadmap, the NVLink fabric, the DGX and HGX system designs, the relationship with TSMC for advanced packaging — all of it had been built quietly through years when none of it mattered very much in revenue terms. When the AI workload arrived, the ecosystem snapped into place and everything Nvidia had been quietly compounding became the only available answer to the question 'how do I train a frontier model?'

Source: Nvidia 10-Q filings FY2023-FY2025; *The Information* and Reuters coverage of H100 lead times; Krizhevsky et al., "ImageNet Classification with Deep Convolutional Neural Networks," NeurIPS 2012.

The next bet: efficiency at the limit (2025-2026)

The H100 boom answered one question: how do I train a frontier model. The bet Nvidia is placing now answers a harder one: how do you keep delivering more intelligence once Moore's Law has stopped paying for it. Bryan Catanzaro, who leads Nvidia's Nemotron model group, states the constraint plainly. The original Moore's Law, twice the transistors for the same cost every 24 months, has been economically dead for five to ten years. Transistors still shrink, but slower and more expensively. The free lunch that funded four decades of compute gains is over, and the gains now have to come from somewhere else.

That somewhere else is codesign and specialisation, squeezing more useful work out of every transistor, every watt, and every byte moved. Catanzaro's framing is that the industry is now running at the limit, both economic and physical (power): "If you accept that we're going to be running at the limit, the way to get more intelligence is to be more efficient. We can't get more intelligence by applying more force." The sharpest 2026 expression of that principle is numerical precision. Nvidia pre-trained its Nemotron 3 Ultra and Super models in 4-bit arithmetic (a block-scaled format called NVFP4, where each number has only 16 possible values). Four-bit numbers are far cheaper to move and store, and Blackwell Ultra runs them at much higher throughput. Pre-training in 4 bits was a genuine research bet, because getting the numerics wrong makes the model diverge. It was also top-down: leadership funded the FP4 hardware first, then challenged the team to make pre-training work on it, precisely so the hardware's advantage would have demand waiting to meet it.

The deeper point is why Nvidia builds its own frontier models at all, and here the second bet rhymes with the first. CUDA was funded for a decade not as a product but as the thing that would let Nvidia understand and own the workload. Nemotron plays the same role today. Catanzaro says it has "two jobs," and the first is survival: "help us understand how to build the systems of the future. The first job of Nemotron is to make sure Nvidia continues to exist." You cannot codesign a rack for a mixture-of-experts routing pattern you have never had to serve. Building frontier models is how Nvidia earns the understanding to build the next machine, a lineage that runs back to Megatron in 2017, the systems project that proved the largest transformers could be trained on Nvidia hardware rather than only on Google's TPUs. That program compounded over the following decade: the 530-billion-parameter Megatron-Turing NLG (2021, co-built with Microsoft), the openly released Nemotron-4 340B (2024), and now the hybrid Nemotron 3 family, each one teaching Nvidia more about the workload its next chip has to run.

4-bit

NVFP4 precision Nemotron 3 was pre-trained in

Distinct values a 4-bit number can represent

~5-10 yrs

Since Moore's Law died as an economic law

Source: Bryan Catanzaro (VP, Applied Deep Learning Research, Nvidia), The MAD Podcast with Matt Turck, July 2, 2026.

Strategic read

The Nvidia story is sometimes told as a single great bet that paid off. The more accurate framing is that it is two great bets, fifteen years apart, with the second only possible because the first had been managed carefully through the period when nobody noticed. The 1996 bet was on consumer 3D — survival-grade, defensive. The 2006 bet was on general-purpose parallel compute — strategic, offensive, and entirely speculative for at least a decade.

What enabled Nvidia to hold the CUDA bet through ten years of looking foolish? Three things. First, the company was profitable on graphics throughout — CUDA was a side bet, not a do-or-die one. Second, the CEO had been there the whole time and had a personal conviction strong enough that nobody in the company seriously challenged the investment. Third, the company had no comparable internal alternative to fund — there was no parallel team building a different vision of the future that CUDA was competing against. The result was that CUDA accumulated compounding investment for fifteen years without interruption.

The investment lesson is that some of the most valuable industrial positions are built during periods where the position looks wrong to the surrounding consensus. The reason most companies do not make these bets is structural: shareholder pressure, internal politics, CEO turnover, or the difficulty of justifying the cost of an investment whose payoff is uncertain and whose timing is wrong. Nvidia avoided every one of these traps, partly by luck and partly because Jensen Huang ran the company as a private fiefdom even after the IPO. The investor asking 'who is the next Nvidia' should look for the same configuration today — a profitable core business, a long-tenured CEO with technical conviction, and an unpopular side bet on a structural technology shift that the market has not yet priced in. Several candidates exist. None of them are obvious yet, which is the point.