GLM-5.1: The Open-Source Model That Beat GPT-5.4

April 19, 2026

GLM-5.1: The Open-Source Model That Beat GPT-5.4

TL;DR

Z.ai released GLM-5.1 on April 7, 2026 — a 754-billion-parameter open-weight model that scored 58.4% on SWE-bench Pro, edging past GPT-5.4 (57.7%) and Claude Opus 4.6 (57.3%) to claim the top spot, making it the first open-source model to ever lead that leaderboard. It achieved that result with zero NVIDIA hardware, trained entirely on 100,000 Huawei Ascend 910B chips. The weights are MIT-licensed, free to download, and already available via API at $0.95 per million input tokens. Its agentic execution engine can run autonomously for eight hours straight — no human checkpoints required.

As of April 16, Claude Opus 4.7 has since moved ahead at 64.3% on SWE-bench Pro, but the fact that an open-weight model briefly held the #1 position at all is a milestone the AI industry will not forget quickly.


What You'll Learn

  • What GLM-5.1 is and how it differs from its predecessor GLM-5
  • Its benchmark performance on SWE-bench Pro and Code Arena, plus important caveats
  • How a model on the US export-control Entity List was built without a single NVIDIA chip
  • What 8-hour autonomous execution actually means in practice
  • How to access GLM-5.1 — locally, via API, and what it costs
  • How it compares to DeepSeek, Qwen, and other open-weight rivals

What Is GLM-5.1?

GLM-5.1 is the latest model in Z.ai's General Language Model series, released on April 7, 2026. It is a post-training upgrade to GLM-5, sharing the same Mixture-of-Experts (MoE) architecture but with substantially improved coding ability, tool use, and sustained autonomous execution1.

Z.ai, formerly known as Zhipu AI, was spun out of Tsinghua University and has been building open LLMs since the original ChatGLM era. The company rebranded to Z.ai in July 2025 as it expanded internationally and shifted its focus toward agentic AI. On January 8, 2026, Z.ai debuted on the Hong Kong Stock Exchange (HKEX: 02513) as Knowledge Atlas Technology Joint Stock Company Limited, raising approximately $558 million in one of the largest AI IPOs of the year2. You can read about its earlier work in our inside GLM-4 capabilities post.

One geopolitical note that shapes everything about this model: Zhipu AI was added to the US Commerce Department's Entity List on January 16, 20253. As a result, the company cannot legally purchase NVIDIA hardware. GLM-5.1 was trained entirely on 100,000 Huawei Ascend 910B chips — a fact that matters both for evaluating the Ascend ecosystem's maturity and for understanding why this launch is diplomatically significant4.


Architecture: 754 Billion Parameters, 40 Billion Active

GLM-5.1 uses a Mixture-of-Experts architecture with 754 billion total parameters. During inference, only approximately 40 billion parameters are activated per token — a design that makes the model far more computationally tractable than a dense 754B model would be1.

The MoE routing uses 256 experts with top-8-routing per token, meaning eight specialist sub-networks handle each token, plus one expert that operates across all tokens. This architecture is similar in principle to what Mixtral and DeepSeek-V3 pioneered in the open-source community.

Context window and output capacity:

SpecValue
Context window~200,000 tokens
Max output (official)131,072 tokens
ArchitectureMixture-of-Experts
Total parameters754 billion
Active per token~40 billion
Training hardware100,000 Huawei Ascend 910B

Benchmark Performance

SWE-bench Pro: A Historic First for Open Source

SWE-bench Pro is a coding benchmark created by Scale AI that evaluates a model's ability to resolve real-world GitHub issues against private, contamination-resistant codebases. Each task requires at least 10 lines of code change, averaging 107.4 lines across 4.1 files5. It is significantly harder than the standard SWE-bench Verified, and scores that look impressive on Verified translate to much lower numbers on Pro.

According to Z.ai's internal evaluation6, GLM-5.1 scored as follows:

ModelSWE-bench ProLicense
GLM-5.158.4%MIT (open weight)
GPT-5.457.7%Proprietary
Claude Opus 4.657.3%Proprietary
Gemini 3.1 Pro54.2%Proprietary

This is the first time an open-weight model has claimed the top position on SWE-bench Pro.

Important caveat: These scores come from Z.ai's own evaluation runs, not a fully independent third-party assessment. The precise margin — less than one point over GPT-5.4 — should be treated as preliminary until independently verified. Z.ai used a 200K-token context window and their own scaffolding setup, which may differ from how other labs measure themselves. For context: Anthropic independently measured Claude Opus 4.6 at 53.4% on SWE-bench Pro using its own scaffolding. The 57.3% figure above is Z.ai's measurement of Opus 4.6 under Z.ai's setup — a 3.9-point difference that illustrates how much evaluation methodology affects scores on this benchmark.

Context update: On April 16, 2026, Anthropic released Claude Opus 4.7, which scored 64.3% on SWE-bench Pro — a 5.9-point margin over GLM-5.17. GLM-5.1's time at the top lasted nine days. That does not diminish the achievement; the benchmark gap between the best open and closed models has been shrinking for two years, and briefly taking the crown confirms open source has arrived at the frontier.

Code Arena: First Open-Weight Model in the Top 3

Arena.ai independently confirmed on April 10, 2026 that GLM-5.1 holds an Elo rating of 1,530 on Code Arena — placing it third globally on their agentic webdev leaderboard4. This is notable because it is confirmed by an independent evaluator, not self-reported.

It is the first open-weight model to appear in the top three on Code Arena.

Other Benchmarks

Z.ai also reports strong performance across general reasoning tasks, though these scores are not independently verified:

BenchmarkGLM-5.1
LiveCodeBench83.6%
GPQA Diamond~88% (sources vary)

For the most reliable cross-model comparison, refer to independent evaluators like Artificial Analysis or LM Council rather than any single vendor's figures.

For broader context on where frontier models stand on human-equivalent benchmarks, see the Stanford AI Index 2026 breakdown.


Eight Hours of Autonomous Execution

The headline agentic capability of GLM-5.1 is its ability to run autonomously for more than eight hours without human checkpoints. Z.ai demonstrated this by having the model build a complete Linux desktop environment from scratch — a task it completed through 655 autonomous plan-execute-analyze-optimize iterations in a single uninterrupted session4.

This is longer than any publicly documented single-session autonomous run from a major model. For comparison, most current agentic frameworks assume regular human review cycles or short task horizons.

What enables this is not raw intelligence but execution architecture: GLM-5.1 was designed from the ground up for long-horizon tasks. Its 200K context window lets it hold large codebases in working memory, and its tool-use capabilities are tightly integrated rather than bolted on afterward.

For more on how agentic coding workflows are evolving across the industry, see our post on inside AI coding agents.


Access and Pricing

API Access

GLM-5.1 is available through Z.ai's API, OpenRouter, and Together AI. Pricing on the Z.ai platform8:

Token typePrice per million
Input$0.95
Output$3.15

⚠ Prices change frequently. The values above are for illustration only and may be out of date. Always verify current pricing directly with the provider before making cost decisions: Anthropic · OpenAI · Google Gemini · Google Vertex AI · AWS Bedrock · Azure OpenAI · Mistral · Cohere · Together AI · DeepSeek · Groq · Cursor · GitHub Copilot · Windsurf.

For context, Claude Opus 4.7 costs $5.00/M input and $25.00/M output. On output tokens — which dominate cost in agentic coding tasks — GLM-5.1 at $3.15/M is roughly one-eighth the cost of Opus 4.7 at $25.00/M, a significant difference for high-volume workloads.

Z.ai also offers a "GLM Coding Plan" for teams that want bundled access for IDE integrations, with tiers starting at approximately $10 per month (billed quarterly as $30/quarter for the Lite plan).

Local Deployment

Because GLM-5.1 is MIT-licensed, you can download the weights from Hugging Face (zai-org/GLM-5.1) and run them locally without any usage restrictions or royalty fees. You own your deployment entirely.

Running 754B MoE models locally requires substantial hardware — at 40B active parameters per token, you will need multiple high-end GPUs. The community is actively sharing quantized versions that reduce the footprint considerably. For a guide to running large models locally, see our build local AI guide with Ollama and Qwen 3.


How GLM-5.1 Compares to Other Open-Weight Models

The open-source frontier has become genuinely competitive. GLM-5.1's primary open-weight competitors in April 2026 are DeepSeek-V3.2 (MIT license, 128K context) from DeepSeek, and Qwen 3.6-35B-A3B (Apache 2.0 license, released April 16, 2026) from Alibaba's Qwen team.

GLM-5.1's distinguishing advantages in this peer group are its larger context window (200K vs 128K), its MIT license, and the longest documented autonomous run time of any publicly available model. On the Code Arena independent leaderboard, GLM-5.1's Elo of 1,530 puts it ahead of both, though all three are in striking distance of each other.

Standardized independent SWE-bench Pro comparisons across these three models are not yet published as of this writing. Until a third-party lab evaluates all three under the same scaffolding, treat vendor-reported scores as directional indicators rather than precise rankings.

A notable pattern: the top open-weight coding models in April 2026 all come from Chinese AI labs — Z.ai, DeepSeek, and Qwen — each navigating US export restrictions in different ways. For more on how Huawei's chip ecosystem supports this development, see our Huawei Ascend chip breakdown.


The Geopolitical Dimension

Zhipu AI's placement on the US Entity List in January 2025 was intended to restrict access to advanced computing — specifically NVIDIA's H100 and H200 GPUs, which dominate frontier model training globally.

GLM-5.1 demonstrates that this restriction did not stop frontier-level training. The 100,000 Huawei Ascend 910B chips used represent a massive domestic compute cluster built without any US silicon. The model's performance — reaching Code Arena Elo 1,530 and holding the top SWE-bench Pro position for nine days — suggests the Ascend ecosystem has crossed a threshold where it can produce genuinely competitive results4.

This has implications beyond Zhipu: it signals that China's alternative compute stack is now capable of supporting frontier model development, a conclusion that analysts tracking the US-China AI competition will be watching closely. The Stanford AI Index 2026 noted the performance gap between US and Chinese models has compressed to just 2.7 percentage points9.


Bottom Line

GLM-5.1 is the clearest evidence yet that the gap between open-weight and closed frontier models has collapsed to nearly nothing on coding benchmarks. A model built without a single NVIDIA chip, by a company on a US trade blacklist, using hardware that was supposed to be a generation behind — briefly held the top score on the industry's hardest coding benchmark.

The nine days it spent at the top of SWE-bench Pro matter because they prove that sustained frontier performance is no longer exclusive to companies with unlimited access to US silicon. For developers, the practical message is more immediate: MIT-licensed, 200K-context, $0.95-per-million-token frontier-class coding performance is now real and available today.


References

Footnotes

  1. Z.ai. "GLM-5.1 Model Architecture." Z.AI Developer Documentation. Published April 7, 2026. https://docs.z.ai/guides/llm/glm-5.1 2

  2. Z.ai. "Z.ai IPO on Hong Kong Stock Exchange." January 8, 2026. https://en.wikipedia.org/wiki/Z.ai

  3. Federal Register. "Addition of Entities to and Revision of Entry on the Entity List." Published January 16, 2025. https://www.federalregister.gov/documents/2025/01/16/2025-00704/addition-of-entities-to-and-revision-of-entry-on-the-entity-list

  4. MarkTechPost. "Z.AI Introduces GLM-5.1: An Open-Weight 754B Agentic Model That Achieves SOTA on SWE-Bench Pro and Sustains 8-Hour Autonomous Execution." Published April 8, 2026. https://www.marktechpost.com/2026/04/08/z-ai-introduces-glm-5-1-an-open-weight-754b-agentic-model-that-achieves-sota-on-swe-bench-pro-and-sustains-8-hour-autonomous-execution/ 2 3 4

  5. Scale AI. "SWE-Bench Pro: Raising the Bar for Agentic Coding." https://scale.com/blog/swe-bench-pro

  6. Awesome Agents. "GLM-5.1 Review: Open-Source Model Tops SWE-Bench Pro." Published April 2026. https://awesomeagents.ai/reviews/review-glm-5-1/

  7. NerdLevelTech. "Claude Opus 4.7: Benchmarks, Features & Pricing." Published April 17, 2026. https://nerdleveltech.com/claude-opus-4-7-benchmarks-features-pricing

  8. Z.ai Pricing. "GLM Coding Plan." https://z.ai/subscribe

  9. Stanford HAI. "Inside the AI Index: 12 Takeaways from the 2026 Report." Published April 2026. https://hai.stanford.edu/news/inside-the-ai-index-12-takeaways-from-the-2026-report

Frequently Asked Questions

Yes. The MIT license permits download, modification, fine-tuning, and commercial deployment with no royalty fees or usage restrictions. You need only your own hardware or cloud budget.

FREE WEEKLY NEWSLETTER

Stay on the Nerd Track

One email per week — courses, deep dives, tools, and AI experiments.

No spam. Unsubscribe anytime.