🎙️ Episode 31705:55 • June 27, 2026
OpenAI Jalapeño: First Custom AI Inference Chip
Listen to this episode
AI-generated discussion by Alex and Jamie
About this episode
Join Alex and Jamie on this episode of the Nerd Level Tech AI Cast as they dive into the sizzling world of OpenAI's newly launched custom AI inference chip, aptly named Jalapeño. Discover how this innovative ASIC is designed specifically for lightning-fast language model responses, and learn why OpenAI felt the need to create their own chip in a market dominated by Nvidia. Tune in for a spicy discussion that combines tech insights with a dash of humor!
Transcript
[Alex]: Welcome back to the Nerd Level Tech AI Cast, where “nerd” is always a compliment and “level” is always set to “insane.” I’m Alex— [Jamie]: —and I’m Jamie, your resident question-asker, meme-sharer, and person most likely to accidentally unplug the server rack. [Alex]: You did that *one* time, Jamie. [Jamie]: And you’ll never let me forget it. [PAUSE] Anyway, today we’re talking about something spicy—literally. OpenAI just dropped their first-ever custom AI inference chip, and they named it… Jalapeño. [Alex]: Because apparently, “Ghost Pepper” was too much heat for the server room. [chuckles] But yeah, this is a pretty big deal. OpenAI and Broadcom teamed up to build this chip from scratch, and they did it in just nine months. That’s like, Silicon Valley pregnancy speed. [Jamie]: Wait, nine months? For a whole new chip? I’ve seen people take longer to pick a laptop. [Alex]: Right? And this isn’t just any chip. Jalapeño is an ASIC—which stands for Application-Specific Integrated Circuit. Basically, it’s a chip that does one job: running big language models, like ChatGPT, really fast. [Jamie]: So, it’s not for training the models. Just for answering all our weird late-night AI questions, right? [Alex]: Exactly. Training a model is like teaching it all the facts; inference is when you ask it to use what it’s learned. Jalapeño is tuned for inference only—OpenAI calls it their first “Intelligence Processor.” It’s designed to run those models as efficiently as possible. [Jamie]: Okay, but why bother making their own chip? Aren’t Nvidia GPUs good enough? I mean, Nvidia’s been printing money with those things. [Alex]: Oh, Nvidia’s still doing fine—don’t cry for Jensen just yet. [PAUSE] But here’s the rub: OpenAI’s serving billions of AI requests a day. Every ChatGPT message, every API call, every “write me a poem about quantum cats”—that’s all inference. It’s the part of the bill that scales with usage. [Jamie]: And I bet that bill is spicy, too. [Alex]: You have no idea. So, if you build a chip that’s laser-focused just on inference and strips out all the extras GPUs need for training, you can get way more bang for your buck—and watt. [Jamie]: Performance per watt… performance per dollar… I feel like I’m in a late-night infomercial. “But wait—there’s more!” [Alex]: [laughs] It does kind of sound like that. OpenAI claims Jalapeño’s performance per watt is “substantially better than current state-of-the-art.” They’re still crunching the final numbers, but early testing looks promising. [Jamie]: Did they say how much it costs? Or is this one of those “call for a quote” situations? [Alex]: Pretty much the latter. No official price tag—just some hints. Broadcom’s CEO told Bloomberg the cost per inference token could be 50 times lower than current GPUs, but… grain of salt. No independent verification yet, and OpenAI’s official line is “we’ll publish a technical report soon.” [Jamie]: So, lots of hype, not much spec sheet. Did they at least show us what’s inside Jalapeño? Like, how many flamin’ hot transistors are we talking? [Alex]: That’s the mystery. No detailed specs yet. Some chip nerds zoomed in on the wafer they showed on stage—think CSI: Semiconductor Edition—and guessed it’s got a big compute chiplet, maybe six high-bandwidth memory modules, and some fancy networking bits. But until OpenAI drops the full datasheet, it’s speculation. [Jamie]: I love a good nerdy guessing game. But, okay, real talk—does this mean OpenAI is ditching Nvidia? Like, is this the start of the AI chip wars? [Alex]: Not so fast. Jalapeño can’t train models; it can only run them. So, for now, it’s sitting alongside Nvidia GPUs, not replacing them. OpenAI still buys mountains of Nvidia hardware for training, plus they’ve got a deal with AMD, too. The real battle is over inference costs—who can serve more users, cheaper. [Jamie]: It’s like the “dollar menu” of AI, and everyone’s fighting to make it cheaper to get your digital burger. [Alex]: Perfect analogy, Jamie. [PAUSE] And OpenAI’s not alone—Google’s got their TPUs, Meta’s got their own chips, and now OpenAI’s in the mix with Broadcom as their silicon sous chef. [Jamie]: Speaking of which—how did Broadcom get the gig? Aren’t they usually the networking folks? [Alex]: They are, but they also do a ton of custom silicon work. In fact, they’ve built chips for Google, Meta, and now OpenAI. For Jalapeño, OpenAI handled the design, Broadcom built the silicon and networking, and Celestica did the rack and system integration. It takes a village to raise a silicon baby. [Jamie]: [laughs] So, what’s next? When do we actually see these things in the wild? [Alex]: OpenAI says deployment starts at the end of 2026, with Microsoft and other partners rolling them out in massive, gigawatt-scale data centers. And this is just the first chip—OpenAI and Broadcom have a multi-generation hardware roadmap, so expect more spicy chips in the future. [Jamie]: So, bottom line: OpenAI’s building its own stack, from models all the way down to the chips, kinda like how Apple does with their iPhones? [Alex]: Exactly—vertical integration, for the win. They want to own the whole experience, optimize every layer, and keep their AI running fast and cheap. [Jamie]: And hopefully not melt the datacenter in the process. [Alex]: We can only hope. [PAUSE] Alright, that’s our quick dive into OpenAI’s Jalapeño—hot stuff, folks. If you enjoyed this episode, be sure to subscribe, leave us a review, or send us your best AI-generated chili recipes. [Jamie]: Thanks for tuning in to the Nerd Level Tech AI Cast, where we keep things spicy and silicon-y. See you next time! [Outro music fades out]