ai-ml

OpenAI Jalapeño: First Custom AI Inference Chip

June 27, 2026

OpenAI Jalapeño: First Custom AI Inference Chip

On June 24, 2026, OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom-designed chip — a purpose-built accelerator for large language model (LLM) inference that OpenAI brands its first "Intelligence Processor."1 It is the opening move in a multi-generation hardware platform the two companies are building together, and the clearest sign yet that OpenAI wants to own the silicon under ChatGPT rather than only rent it from Nvidia.

In one line: Jalapeño is an application-specific chip (ASIC) that OpenAI designed from scratch to run AI models — not train them — and co-developed with Broadcom from design to manufacturing tape-out in just nine months.1

TL;DR

  • What happened: OpenAI and Broadcom revealed Jalapeño on June 24, 2026 — OpenAI's first custom chip, an ASIC built only for LLM inference.1
  • The claim: OpenAI says early testing shows "performance per watt substantially better than current state-of-the-art." It is still measuring final numbers and will publish a technical report "in the coming months."1
  • The speed: Designed to manufacturing tape-out in nine months — what OpenAI calls the fastest ASIC development cycle it believes has ever been achieved in advanced semiconductors — with OpenAI's own models helping accelerate the design.1
  • The specs gap: OpenAI and Broadcom did not disclose detailed specifications. Tom's Hardware, analyzing the wafer shown on stage, estimates a reticle-sized die with roughly six HBM modules — but stresses the architecture read is speculation.2
  • The backdrop: Jalapeño is the first concrete chip from the 10-gigawatt accelerator partnership the two firms announced in October 2025; deployment starts by the end of 2026 with partners including Microsoft.13
  • On Nvidia: Jalapeño is inference-only and cannot train models — it supplements OpenAI's Nvidia GPUs rather than replacing them.4
  • On cost: OpenAI's written announcement gave no cost figure; President Greg Brockman cited a "performance per dollar" gain.5 Separately, Broadcom CEO Hock Tan told Bloomberg that early testing showed roughly 50% lower cost per inference token versus current GPUs — a self-reported number with no disclosed baseline or independent verification.6

What You'll Learn

  • What Jalapeño is and why OpenAI calls it an "Intelligence Processor"
  • Why OpenAI built a chip just for inference instead of training
  • What was actually disclosed about the hardware — and what is still speculation
  • How OpenAI compressed a chip design into a nine-month tape-out
  • How Jalapeño fits into the 10-gigawatt Broadcom deal
  • Whether this is genuinely a threat to Nvidia

What is the OpenAI Jalapeño chip?

Jalapeño is OpenAI's first custom silicon: an application-specific integrated circuit (ASIC) designed from the ground up to run large language models, which OpenAI calls its first "Intelligence Processor."1 Unlike a graphics processing unit (GPU), which is a general-purpose accelerator that can both train and serve models, an inference ASIC like Jalapeño does one job — running a finished model to answer a user — and is shaped entirely around that job.

OpenAI describes Jalapeño as "a blank-slate design for modern LLM inference, not a general-purpose accelerator adapted from earlier AI workloads."1 The company says the architecture was informed by the systems it runs every day across ChatGPT, Codex, and the API, with the goal of combining the throughput of today's leading accelerators with latency closer to specialized inference systems — the combination that matters most for interactive products at scale.1

OpenAI designed the chip, while Broadcom (NASDAQ: AVGO) handled silicon implementation and networking, and Celestica contributed board, rack, and system integration.1 The chip was delivered on stage to OpenAI CEO Sam Altman and President Greg Brockman by Broadcom President and CEO Hock Tan and Charlie Kawwas, president of its Semiconductor Solutions Group.1

Why OpenAI built a chip only for inference

Training and inference are different problems. Training builds a model; inference is the act of running that finished model to answer a prompt. Inference is the part of the bill that scales directly with usage — every ChatGPT message, every Codex task, every API call — and OpenAI now serves that workload billions of times a day. A chip tuned only for inference can strip away everything a general-purpose GPU carries for training and squeeze more useful work out of each watt.

That economic logic is why OpenAI is attacking inference first. "This is a real performance improvement … on performance per watt and performance per dollar," Brockman said of the chip in a CNBC interview.5 OpenAI frames Jalapeño as part of a deliberate "full-stack" strategy: rather than only building models and products, it now designs "the infrastructure underneath them: chip architecture, kernels, memory systems, networking, scheduling, deployment systems, and product experience," so every layer can be optimized toward the same goal.1 Observers have compared the approach to Apple's vertical integration of silicon and software.7

The person running that effort is Richard Ho, who leads OpenAI's hardware program. Ho spent nearly nine years at Google as a leader on its Cloud TPU project and later ran hardware engineering at the photonics startup Lightmatter before joining OpenAI — exactly the custom-accelerator pedigree this kind of project requires.8

What is actually inside Jalapeño — and what isn't confirmed

Here is the important caveat: OpenAI and Broadcom did not disclose detailed specifications. What they published is a set of claims and a strategy, not a datasheet.

On the record, OpenAI says early testing shows performance per watt "substantially better than current state-of-the-art," that final performance is still being measured, and that a technical report will follow in the coming months.1 Architecturally, OpenAI says Jalapeño "reduces data movement and balances compute, memory, and networking resources to achieve realized utilization much closer to theoretical peak performance," with Broadcom's Tomahawk networking silicon connecting chips at scale.1 OpenAI also says engineering samples are already running real machine-learning workloads in the lab at production target frequency and power, including a model it identifies as GPT‑5.3‑Codex‑Spark.1

Everything beyond that is analysis, not disclosure. Tom's Hardware studied the wafer and package OpenAI showed on stage and estimated that the package holds one large compute chiplet surrounded by roughly six high-bandwidth memory (HBM) modules, plus an I/O chiplet flanked by two structural dummy dies — on a reticle-sized die, about as large as current lithography physically allows.2 But the publication was explicit that it was "speculating": the floorplan "does look like" a systolic-array-style accelerator, yet "from the image alone, it is impossible to tell" the exact datapath.2 Reports of a specific process node (some outlets cite TSMC's 3nm) and of an exact HBM count have not been confirmed by either company. Treat those numbers as informed guesses until the technical report lands.

The same caution applies to cost. The headline "50% cheaper" figure does not appear in OpenAI's written announcement, which describes performance per watt only as "substantially better than current state-of-the-art."1 The number comes from Broadcom CEO Hock Tan, who told Bloomberg that early testing showed roughly 50% lower cost per inference token than current-generation GPUs.6 Treat it as a self-reported result: it is based on OpenAI's own chosen workloads, with no disclosed comparison baseline and no independent verification. From OpenAI itself, the gain is stated only qualitatively — Brockman's "performance per dollar" — with specifics deferred to a future technical report.5

A nine-month tape-out, accelerated by OpenAI's own models

The most striking disclosed fact is the timeline. Jalapeño went "from initial design to manufacturing tape-out in just nine months," which OpenAI calls "what we believe to be the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors."1 Tape-out is the point at which a design is finalized and handed to the fab — it is the finish line for the design phase, not for shipping, which is why deployment is still slated for the end of 2026.

Part of that speed came from an unusual source: OpenAI used its own models to accelerate parts of the design and optimization work.19 In OpenAI's framing, "the same models served to users are helping improve the infrastructure used to run future models" — a flywheel in which AI helps design the chips that make AI cheaper to run.1 It is a self-referential pitch, and worth noting it remains a claim about a development process rather than an independently measured result. Still, a nine-month cycle for a leading-edge accelerator would be fast by any standard in this industry.

The 10-gigawatt deal behind Jalapeño

Jalapeño did not appear out of nowhere. In October 2025, OpenAI and Broadcom announced a strategic collaboration to deploy 10 gigawatts of OpenAI-designed AI accelerators, with Broadcom supplying Ethernet and networking and rack deployments scheduled to begin in the second half of 2026 and complete by the end of 2029.3 Terms were not disclosed; the Financial Times estimated the build-out could cost OpenAI on the order of $350–500 billion.3 Jalapeño is the first silicon to come out of that agreement — roughly eight months after it was signed.1

Broadcom's Hock Tan tied the chip directly to that infrastructure push, saying the collaboration enables "the deployment of gigawatt scale data centers with Microsoft and other partners beginning in 2026."1 OpenAI calls Jalapeño "the first step in a multi-generation compute platform," combining its own accelerator designs with Broadcom's silicon and networking and Celestica's system integration.1

What Jalapeño means for Nvidia

It is tempting to read "OpenAI builds a chip" as "OpenAI ditches Nvidia." That is not what happened. Jalapeño is an inference-only ASIC — it cannot train models — so it sits alongside OpenAI's GPUs rather than supplanting them. OpenAI continues to buy large volumes of Nvidia hardware for training, and it has a separate compute agreement with AMD.4 The realistic near-term picture is a multi-vendor stack, not a clean break.

Where the pressure is real is inference economics. Inference is the high-volume, usage-driven part of the workload, and it is precisely the part a tightly-optimized custom chip can serve more cheaply at scale. If Jalapeño delivers on its performance-per-watt claim, it chips away at the most price-sensitive slice of Nvidia's business — the same slice rivals are targeting with their own silicon. OpenAI is following a path Broadcom has already paved with the major hyperscalers: it builds the custom AI accelerators behind Google's TPUs and Meta's MTIA, and OpenAI is now among its highest-profile customers. For the wider context, see our coverage of Broadcom's custom-silicon deal with Meta, the broader custom AI chip race against Nvidia, and how Google split its TPU line into training and inference chips — the same inference-first instinct now driving OpenAI.

The Bottom Line

Jalapeño is a strategically significant chip wrapped in a deliberately thin disclosure. The headline facts — first OpenAI silicon, a nine-month tape-out, a multi-generation platform with Broadcom — are real and on the record.1 The numbers everyone wants — exact specs, process node, and the "50% cheaper" claim — are not yet confirmed, and the strongest performance language ("substantially better than state-of-the-art") is still hedged as early testing.125 The honest read for now: OpenAI has credibly entered the custom-silicon race and aimed squarely at inference economics, but the proof will arrive with the promised technical report and the first end-of-2026 deployments. Until then, watch what ships, not what was shown on stage.


Sources

Footnotes

  1. OpenAI, "OpenAI and Broadcom unveil LLM-optimized inference chip" (June 24, 2026). https://openai.com/index/openai-broadcom-jalapeno-inference-chip/ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

  2. Tom's Hardware, "Broadcom and OpenAI unveil custom-built Jalapeño inference processor" — wafer/package analysis (June 24, 2026). https://www.tomshardware.com/tech-industry/artificial-intelligence/broadcom-and-openai-unveil-custom-built-jalapeno-inference-processor-openais-first-chip-is-a-massive-reticle-sized-asic-built-in-an-ultra-fast-nine-month-development-cycle 2 3 4 5

  3. OpenAI, "OpenAI and Broadcom announce strategic collaboration to deploy 10 gigawatts of OpenAI-designed AI accelerators" (Oct. 13, 2025). https://openai.com/index/openai-and-broadcom-announce-strategic-collaboration/ 2 3

  4. TechSpot, "OpenAI debuts Jalapeño, its first custom AI chip to cut ChatGPT costs and reduce Nvidia dependency" (June 24, 2026). https://www.techspot.com/news/112890-openai-debuts-jalapeo-custom-chip-built-cut-chatgpt.html 2 3

  5. CNBC, "OpenAI and Broadcom reveal Jalapeno, first AI chip in partnership" (June 24, 2026). https://www.cnbc.com/2026/06/24/openai-and-broadcom-reveal-jalapeno-first-ai-chip-in-partnership.html 2 3 4

  6. TechTimes, "OpenAI's First Custom AI Chip Targets 50% Cheaper Inference: Jalapeño Unveiled" — reporting Broadcom CEO Hock Tan's comments to Bloomberg (June 24, 2026). https://www.techtimes.com/articles/319012/20260624/openais-first-custom-ai-chip-targets-50-cheaper-inference-jalapeno-unveiled.htm 2 3

  7. TechRadar, "Broadcom and OpenAI debut Jalapeño Intelligence Processor, plot an Apple-like move to 'build the full stack'" (June 24, 2026). https://www.techradar.com/pro/broadcom-and-openai-debut-jalapeno-intelligence-processor-plot-an-apple-like-move-to-build-the-full-stack

  8. Data Center Dynamics, "OpenAI appoints former Google TPU leader as head of hardware." https://www.datacenterdynamics.com/en/news/openai-appoints-former-google-tpu-leader-as-head-of-hardware-hiring-for-experts-in-data-center-facility-design/

  9. VentureBeat, "OpenAI unveils first custom AI inference chip, Jalapeño, with Broadcom — and its development was sped-up with OpenAI's own models" (June 24, 2026). https://venturebeat.com/infrastructure/openai-unveils-first-custom-ai-inference-chip-jalapeno-with-broadcom-and-its-development-was-sped-up-with-openais-own-models

Frequently Asked Questions

Jalapeño is OpenAI's first custom-designed chip, unveiled with Broadcom on June 24, 2026. It is an ASIC built specifically for large language model inference — running finished AI models to answer users — which OpenAI calls its first "Intelligence Processor." 1