Yes. The weights are released under Apache 2.0 and free to download from Hugging Face and ModelScope, and during launch the model is free to run on OpenRouter's free tier ($0/$0, rate-limited) and on SiliconFlow's early access. 2 9

Is Nex-N2-Pro better than GPT-5.5?

On Nex's own benchmarks it narrowly beats GPT-5.5 on SWE-Bench Pro (58.8 vs 58.6) but trails on SWE-Bench Verified, Terminal-Bench 2.1, and especially DeepSWE (33.6 vs 70). The numbers are self-reported and not independently verified, so treat "better" as benchmark-specific and provisional. 1 3

What model is Nex-N2-Pro based on?

It is post-trained on Alibaba's Qwen3.5-397B-A17B , the flagship open-weight Qwen MoE from February 2026. The architecture, multimodality, and efficiency come from Qwen3.5; Nex AGI's contribution is the agentic post-training (Agentic Thinking). 2 7

Can you download Nex-N2-Pro?

Yes — unlike MiniMax M3, whose weights were still pending at its launch, Nex-N2-Pro's weights are live on Hugging Face and ModelScope. But the full model needs roughly two 8×H100 nodes to self-host, so most users will rely on hosted endpoints. 1 2

How many parameters does Nex-N2-Pro have?

397 billion total parameters, with 17 billion active per token thanks to its Mixture-of-Experts design — the same 397B/17B configuration as its Qwen3.5-397B-A17B base. 2 7

ai-ml

Nex-N2-Pro: Open-Weight Coder vs GPT-5.5 (2026)

June 10, 2026

#nex-n2-pro #nex agi #open-weight models #ai coding models #llm benchmarks #qwen3.5 #agentic ai

Nex-N2-Pro: Open-Weight Coder vs GPT-5.5 (2026)

Nex-N2-Pro is a free, open-weight 397-billion-parameter coding model from Nex AGI, post-trained on Alibaba's Qwen3.5.¹² On its own benchmarks it leads the open-weight pack and edges GPT-5.5 on SWE-Bench Pro, but it trails the closed frontier on the hardest agentic and reasoning tasks.¹³

Released June 2, 2026 and now spreading across free endpoints, it is the latest entry in an early-June surge of open-weight coding models — arriving a day after MiniMax M3.⁴⁵ Two things set it apart: its headline feature is an "Agentic Thinking" framework that decides on its own how hard to reason, and — unlike M3, whose weights were still pending at launch — you can actually download Nex-N2-Pro right now.¹²

TL;DR

Nex-N2-Pro is the open-weight, agentic coding model from Nex AGI, an open-source alliance initiated by the Shanghai Innovation Institute.¹⁶ It is a Mixture-of-Experts model with 397 billion total parameters but only 17 billion active per token, post-trained on Alibaba's Qwen3.5-397B-A17B base, with a 262K-token context window and image input.²⁴⁷ Its signature trick is Adaptive Thinking, which auto-adjusts reasoning depth per step and, Nex says, cuts thinking tokens by 30–50% versus always-on reasoning at equal or better quality.⁴ On Nex's own benchmark table, Nex-N2-Pro scores 80.8% on SWE-Bench Verified, 58.8% on SWE-Bench Pro, and 75.3 on Terminal-Bench 2.1, narrowly beating GPT-5.5 on SWE-Bench Pro and Claude Opus 4.7 on Terminal-Bench, while leading or matching rival open models like MiniMax M3, DeepSeek-V4-Pro, GLM-5.1, and Kimi-K2.6 on most rows.¹ Three caveats matter: every number is self-reported on Nex's own harness; the comparison baseline is Opus 4.7, not the newer Opus 4.8; and on the toughest agentic tests the gap to the closed frontier is wide (DeepSWE: 33.6 vs GPT-5.5's 70).¹³⁸ The weights are free under Apache 2.0, and it runs free on OpenRouter and SiliconFlow during launch — but self-hosting the full model needs roughly two nodes of 8×H100.¹⁹

What is Nex-N2-Pro?

Nex-N2-Pro is a large language model built for coding and agentic work, released and open-sourced on June 2, 2026 by Nex AGI (also styled "Nex").¹⁴ Nex is not a conventional startup but an open-source alliance "initiated by the Shanghai Innovation Institute," developing in the open under the nex-agi organization on GitHub and Hugging Face, with partners including Shanghai Qiji Zhifeng, Mosi Intelligence, and KuafuAI.⁶ The group ships a full agent stack — models, an agent framework (NexAU), data pipelines, and training infrastructure — of which Nex-N2 is the flagship model.⁶

Architecturally, Nex-N2-Pro is a sparse Mixture-of-Experts (MoE) model with 397 billion total parameters and 17 billion active per token, so its inference cost behaves closer to a 17B dense model than its headline size suggests.²⁴ It accepts text and images as input and produces text, supports a 262,000-token context window, and ships under the permissive Apache 2.0 license with downloadable weights on Hugging Face and ModelScope.²⁴

The crucial detail — and one the marketing soft-pedals — is that Nex-N2-Pro is post-trained on Alibaba's Qwen3.5-397B-A17B, not a base model Nex built from scratch.¹² The 397B/17B architecture, the native multimodality, and the long-context efficiency all come from Qwen3.5, Alibaba's flagship open-weight MoE from February 2026, which fuses linear attention (via Gated Delta Networks) with a sparse MoE.⁷ Nex AGI's actual contribution sits on top: the agentic post-training. The previous generation, Nex-N1, was built the same way on a different base — DeepSeek-V3.1 — so swapping in Qwen3.5 for N2 is the lineage here.⁴

What is Agentic Thinking (and Adaptive Thinking)?

Agentic Thinking is Nex's framework for unifying reasoning, tool use, and environment execution into a single loop, instead of bolting them together as separate capabilities.¹ It is the headline idea behind the model, and it has two halves.¹

Adaptive Thinking lets the model decide, on its own, when to think and how deeply — running simple actions fast while reserving thorough reasoning for the decisions that matter. Nex's specific claim is that this cuts thinking tokens by 30–50% on routine steps versus always-on reasoning, with equal or better task performance.⁴ That is the part worth caring about: long agentic runs burn most of their cost on reasoning tokens, so a model that reasons only when it needs to is cheaper to operate at the same quality — if the claim holds up under independent testing.

Coherent Thinking is the second half: carrying one consistent reasoning style across general tasks and diverse agentic work, so capability transfers cleanly between, say, terminal execution and multimodal generation.¹ In practice both behaviors are exposed through standard tooling — Nex-series models emit explicit reasoning traces and support function calling, using the same qwen3 reasoning parser and qwen3_coder tool-call parser as their Qwen base, and they advertise plug-and-play use with Claude Code, Cursor, and other agent harnesses.²⁴

Nex-N2-Pro benchmarks: strong, but self-reported

On Nex's published evaluation suite, Nex-N2-Pro's coding numbers are genuinely good. It posts 80.8% on SWE-Bench Verified (fixing real bugs in real repositories), 58.8% on SWE-Bench Pro, 75.3 on Terminal-Bench 2.1, and 1585 on GDPval — OpenAI's benchmark for real-world, economically valuable knowledge work scored by expert graders across 44 occupations.¹¹⁰ Nex frames the model as keeping "pace with top-tier models such as GPT-5.5 and Opus 4.7."¹

Here is the context most coverage drops. Every one of those numbers comes from Nex AGI's own benchmark table — its harness, its scaffolding, its choice of baselines.¹ That is standard practice for a model launch, but it makes the results claims awaiting independent replication, not settled facts. The table even shows the seams: it lists GPT-5.5 at 82.9% on SWE-Bench Verified, whereas OpenAI officially reports GPT-5.5 at 88.7% on the same benchmark, and independent harnesses like Vals.ai put GPT-5.5 lower still at ~82.6%.³¹¹ In other words, the comparison column is a mix of harness conditions, which is exactly why a self-run table should be read as a strong opening bid rather than a verdict.

Benchmark (Nex-reported)	Nex-N2-Pro	GPT-5.5	Opus 4.7	MiniMax M3	DeepSeek-V4-Pro
SWE-Bench Verified	80.8	82.9	87.6	80.5	80.6
SWE-Bench Pro	58.8	58.6	64.3	59.0	55.4
Terminal-Bench 2.1	75.3	83.4	69.7	66.0	72.0
DeepSWE	33.6	70	54	—	8
GDPval	1585	1769	1753	—	1554
GPQA Diamond	90.7	93.6	94.2	—	90.1

All figures as published by Nex AGI.¹ Higher is better in every row; GDPval is a score, the rest are percentages.

Nex-N2-Pro vs GPT-5.5 and Claude Opus: the honest read

Strip away the "keeps pace" framing and the picture is more specific. Against GPT-5.5, Nex-N2-Pro edges ahead on SWE-Bench Pro (58.8 vs 58.6) but trails on SWE-Bench Verified (80.8 vs 82.9), on Terminal-Bench 2.1 (75.3 vs 83.4), and badly on DeepSWE (33.6 vs 70).¹ Against Claude Opus 4.7, it wins Terminal-Bench 2.1 (75.3 vs 69.7) but loses SWE-Bench Verified by nearly seven points (80.8 vs 87.6) and DeepSWE by twenty (33.6 vs 54).¹ The selective wins are real; so are the losses.

Two structural caveats deepen the point. First, the comparison ceiling is Claude Opus 4.7, but Anthropic shipped Claude Opus 4.8 on May 28, 2026 — before Nex-N2-Pro's June 2 launch — and Opus 4.8 reports roughly 88.6% on SWE-Bench Verified.⁸ Measured against the Anthropic model that was actually current, the gap is wider than the table implies. It is the same flattering-baseline pattern we just flagged in the MiniMax M3 launch, where M3 was also benchmarked against Opus 4.7 rather than 4.8.⁵ Second, the DeepSWE and GDPval rows are where the closed frontier pulls clearly ahead: a 33.6 on DeepSWE against GPT-5.5's 70 is not "near-parity" on hard agentic coding, it is roughly half. (We dug into how brittle and gameable that benchmark is in our DeepSWE breakdown.)

The open-weight pack: where Nex-N2-Pro actually leads

The fairer frame is not Nex-N2-Pro versus the closed frontier — it is Nex-N2-Pro versus the other open-weight challengers, and there it looks like a leader. On Nex's table it matches or beats MiniMax M3, DeepSeek-V4-Pro, GLM-5.1, and Kimi-K2.6 on most rows: 80.8 on SWE-Bench Verified sits at the top of a tight open-weight cluster (M3 80.5, DeepSeek-V4-Pro 80.6, Kimi-K2.6 80.2); on Terminal-Bench 2.1 its 75.3 clears MiniMax M3's 66.0 and DeepSeek-V4-Pro's 72.0; and on DeepSWE its 33.6, while far behind the closed models, towers over the open field (Kimi-K2.6 24, GLM-5.1 18, DeepSeek-V4-Pro 8).¹

That positioning is the story. Over the past few months we have watched Chinese open-weight coding labs run an aggressive cost-and-capability war, from DeepSeek V4 to GLM-5.1 to Kimi K2.6. Nex-N2-Pro's pitch is to take one of the strongest open bases — Alibaba's Qwen3.5 — and squeeze more agentic performance out of it through post-training, then give it away. If the numbers replicate, it is the new front-runner of the open-weight coding pack, even though it is not a genuine threat to GPT-5.5 or Opus 4.8 on the hardest tasks.

Nex-N2-Pro pricing: free now, but heavy to host

The price is the easy part: Nex-N2-Pro is free. The weights are Apache 2.0 and downloadable from Hugging Face and ModelScope, and during the launch window the model runs at no cost on hosted endpoints — $0 input and $0 output on OpenRouter's free tier (rate-limited to roughly 50 requests/day and 20/minute) and free early access on SiliconFlow.²⁹ For a model posting frontier-adjacent coding scores, free hosted access is a strong way to drive adoption.

The catch is hardware. "Open weight" does not mean "runs on your laptop." Nex's own deployment guide launches Nex-N2-Pro across two nodes of 8× H100 GPUs (tensor-parallel 16) using its customized SGLang fork.¹ That is a serious cluster — far beyond what an individual can self-host — so for most people "free" in practice means the rate-limited hosted tiers, not local inference. The smaller sibling, Nex-N2-mini (built on Qwen3.5-35B-A3B-Base), is the more self-hostable option Nex points to, running on a single 2× H100 box, though it gives up real ground on the benchmarks (74.4 vs 80.8 on SWE-Bench Verified).¹

Bottom line

Nex-N2-Pro is a real step for open-weight coding, wrapped in a launch that oversells the comparison. The Agentic Thinking idea — reasoning only as deeply as a step requires — is a sensible answer to the cost of long agent runs, and giving the weights away free under Apache 2.0 is the kind of move that drives fast adoption.¹⁴ But the benchmark narrative leans on self-run tests against Opus 4.7, a baseline Anthropic had already replaced with Opus 4.8, and the hardest agentic tests (DeepSWE, GDPval) show the closed frontier is still comfortably ahead.¹⁸ The honest summary: Nex-N2-Pro is probably the strongest open-weight coding model you can download today, and a weaker claimant to outright GPT-5.5 parity than its own charts suggest. Independent benchmarks — and the experience of actually running it inside a real agent harness — will settle which half of that sentence matters more.

Nex AGI, "Nex-N2: An agentic model with Agentic Thinking" — official GitHub README (release, Agentic Thinking / Adaptive Thinking framework, full benchmark table, Qwen3.5 base, two-node 8×H100 deployment guide). https://github.com/nex-agi/Nex-N2 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹ ↩²² ↩²³ ↩²⁴ ↩²⁵ ↩²⁶
"nex-agi/Nex-N2-Pro," Hugging Face model card (Apache-2.0 license, 397B params, MoE qwen3_5_moe architecture, image-text-to-text, benchmark table, parsers). https://huggingface.co/nex-agi/Nex-N2-Pro ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³
"SWE-Bench Leaderboard," marc0.dev (GPT-5.5 #1 at 88.7% on SWE-Bench Verified, OpenAI-reported, released Apr 23, 2026; Claude Opus 4.7 87.6%, Apr 16, 2026), and Vals.ai independent harness (GPT-5.5 ~82.6%, Opus 4.7 ~82.0%) showing harness variance. https://www.marc0.dev/en/leaderboard ↩ ↩² ↩³ ↩⁴
"Nex-N2-Pro — Model Info, Parameters, Benchmarks," SiliconFlow (created Jun 2, 2026; 262K context window; FP8; Apache-2.0; Adaptive Thinking 30–50% thinking-token claim; prior-generation DeepSeek-V3.1-Nex-N1). https://www.siliconflow.com/models/nex-n2-pro ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹
"New Models Today — AI & LLM Releases," Price Per Token (Nex-N2-Pro free listing on OpenRouter; MiniMax M3 release context). https://pricepertoken.com/news/model-releases ↩ ↩²
"Nex: Nexus of Agentic Intelligence," Nex AGI (alliance initiated by the Shanghai Innovation Institute; partners and full-stack agent ecosystem — models, NexAU framework, data and RL infrastructure). https://nex-agi.com/en/ ↩ ↩² ↩³
"Alibaba's latest flagship Qwen3.5 models are open-weights MoE performers," DeepLearning.AI The Batch, and "Qwen3.5-397B-A17B — Everything you need to know," Artificial Analysis (Feb 2026 release, 397B total / 17B active, hybrid linear-attention + sparse MoE, native multimodal, Apache 2.0). https://www.deeplearning.ai/the-batch/alibabas-latest-flagship-models-are-open-weights-moe-performers-in-sizes-from-less-than-1b-parameters/ ↩ ↩² ↩³ ↩⁴
"Anthropic's Claude Opus 4.8 is here," VentureBeat (Opus 4.8 released May 28, 2026; ~88.6% on SWE-Bench Verified). https://venturebeat.com/technology/anthropics-claude-opus-4-8-is-here-with-3x-cheaper-fast-mode-and-near-mythos-level-alignment ↩ ↩² ↩³
"Nex AGI: Nex-N2-Pro (free) — API Pricing & Providers," OpenRouter ($0 input / $0 output free tier; free-tier rate limits). https://openrouter.ai/nex-agi/nex-n2-pro:free/pricing ↩ ↩² ↩³
"GDPval: Evaluating AI Model Performance on Real-World Economically Valuable Tasks," OpenAI (benchmark across 44 occupations in the top 9 GDP sectors, expert blind grading). https://openai.com/index/gdpval/ ↩
"SWE-bench Verified," Vals.ai independent benchmark harness (GPT-5.5 82.60%, Claude Opus 4.7 82.00%). https://www.vals.ai/benchmarks/swebench ↩

Frequently Asked Questions

Nex-N2-Pro is an open-weight, agentic coding model released June 2, 2026 by Nex AGI, an open-source alliance initiated by the Shanghai Innovation Institute. It is a 397B-parameter Mixture-of-Experts model (17B active per token) post-trained on Alibaba's Qwen3.5, with a 262K-token context window and image input. 1 2 4

Nex-N2-Pro: Open-Weight Coder vs GPT-5.5 (2026)

TL;DR

What is Nex-N2-Pro?

What is Agentic Thinking (and Adaptive Thinking)?

Nex-N2-Pro benchmarks: strong, but self-reported

Nex-N2-Pro vs GPT-5.5 and Claude Opus: the honest read

The open-weight pack: where Nex-N2-Pro actually leads

Nex-N2-Pro pricing: free now, but heavy to host

Bottom line

Frequently Asked Questions

Related Posts

MiniMax M3: Open-Weight Coding at 1/10 the Cost (2026)

MiniMax M3: Open-Weight 1M-Context Coding Model

Claude Fable 5: Anthropic's Mythos-Class Model (2026)

GPT-5.5: OpenAI's First Retrained Base Since GPT-4.5