753 billion total parameters in a Mixture-of-Experts design, per the official Hugging Face model card. Only a fraction of those experts activate per token, keeping inference cost manageable. 1

What is the context window?

A usable 1 million input tokens with a 128K-token (131,072) output ceiling — five times GLM-5.1's 200K window. 1 2

Does GLM-5.2 beat GPT-5.5?

On some long-horizon coding benchmarks (FrontierSWE, PostTrainBench, MCP-Atlas) yes; on others (DeepSWE, NL2Repo, ProgramBench) GPT-5.5 still leads. Claude Opus 4.8 leads most categories overall. 1

Can I run it locally?

Yes — via vLLM, SGLang, Transformers, KTransformers, or xLLM. Expect to need substantial GPU memory for a 753B MoE model; community quantizations are available for smaller setups. 1

ai-ml

GLM-5.2: Open-Weight 1M-Context Coding Model (2026)

June 23, 2026

#glm-5.2 #z.ai #open-weight models #coding ai #1m context #llm #swe-bench

GLM-5.2: Open-Weight 1M-Context Coding Model (2026)

GLM-5.2 is Z.ai's flagship open-weight model, released in mid-June 2026 under an MIT license. It packs 753 billion parameters in a sparse Mixture-of-Experts design, runs a usable 1-million-token context, and posts coding-benchmark scores that beat GPT-5.5 on several long-horizon tasks while trailing Claude Opus 4.8.

TL;DR

Z.ai (formerly Zhipu AI) shipped GLM-5.2 with weights on Hugging Face and ModelScope under a permissive MIT license.¹ It is built for "long-horizon" coding agents: a 753B-parameter Mixture-of-Experts model with a 1M-token context (up from 200K in GLM-5.1) and a 128K-token output ceiling.¹² On Z.ai's own benchmark table it scores 62.1 on SWE-bench Pro (ahead of GPT-5.5's 58.6) and 81.0 on Terminal-Bench 2.1, and it edges GPT-5.5 on the FrontierSWE long-horizon test (74.4 vs 72.6) — though GPT-5.5 leads on Terminal-Bench (84.0) and Claude Opus 4.8 leads most categories.¹ The standalone API is reported at roughly $1.40 / $4.40 per million input/output tokens, a fraction of GPT-5.5's $5 / $30 list price.³⁴

What you'll learn

What GLM-5.2 is and who built it
How the 1M-token context and IndexShare architecture work
The verified coding and agentic benchmark numbers
How GLM-5.2 stacks up against Claude Opus 4.8, GPT-5.5, and Gemini 3.1 Pro
Pricing, access options, and how to run it locally
The real caveats: data governance, reward hacking, and benchmark scope

What is GLM-5.2?

GLM-5.2 is a large language model from Z.ai, the Beijing-based lab formerly known as Zhipu AI. It is positioned as a "flagship model built for the era of long-horizon tasks" — meaning multi-hour coding-agent runs that span an entire codebase rather than a single function.² The model uses a Mixture-of-Experts (MoE) architecture that activates only a fraction of its weights per forward pass, so total capacity is high (753 billion parameters, per the official Hugging Face model card) while inference cost stays far below a dense model of the same size.¹

Two things make GLM-5.2 notable. First, it is open-weight under an MIT license, which Z.ai markets as "Pure Open — no regional limits, technical access without borders."² That lets teams download, fine-tune, and self-host the model with no acceptable-use governance attached. Second, it ships a genuinely usable 1-million-token context window, which the company says it trained specifically for messy, real-world coding-agent trajectories rather than just advertising a large number.²

The 1M-context architecture: IndexShare and MoE

Stretching context from 200K to 1M tokens is expensive, because the per-token cost of sparse-attention indexing and KV-cache management balloons. Z.ai's answer is a technique it calls IndexShare: instead of computing a fresh attention index at every transformer layer, every four layers share a single lightweight indexer placed at the first of the four. Z.ai reports this cuts per-token indexing FLOPs by 2.9× at 1M context length.²

The model also improves its multi-token-prediction (MTP) layer for speculative decoding. By applying IndexShare and KV-cache sharing to the draft layer and adding rejection sampling, Z.ai raised the average accepted draft length from a baseline of 4.56 to 5.47 tokens — about a 20% increase, which speeds up speculative decoding.² These are the kinds of systems optimizations that make a 1M window practical rather than theoretical.

GLM-5.2 benchmarks (verified)

All figures below come directly from Z.ai's published benchmark table on the official model card and launch blog. Several long-horizon scores were run by independent evaluators — FrontierSWE by Proximal, PostTrainBench by the PostTrainBench team, and SWE-Marathon by Abundant AI — rather than self-reported in isolation.¹

Benchmark	GLM-5.2	GLM-5.1	GPT-5.5	Claude Opus 4.8	Gemini 3.1 Pro
SWE-bench Pro	62.1	58.4	58.6	69.2	54.2
Terminal-Bench 2.1 (Terminus-2)	81.0	63.5	84.0	85.0	74.0
FrontierSWE (Dominance)	74.4	30.5	72.6	75.1	39.6
PostTrainBench	34.3	20.1	28.4	37.2	21.6
SWE-Marathon	13.0	1.0	12.0	26.0	4.0
MCP-Atlas (agentic)	76.8	71.8	75.3	77.8	69.2
AIME 2026 (reasoning)	99.2	95.3	98.3	95.7	98.2

The standout story is long-horizon coding. On FrontierSWE — which measures whether an agent can finish open-ended engineering projects that take hours — GLM-5.2 trails Claude Opus 4.8 by only about 1 point and edges out GPT-5.5.² On PostTrainBench and MCP-Atlas it beats GPT-5.5 and lands just behind Opus. Z.ai's framing is that GLM-5.2 is "the highest-ranked open-source model" across all three of its long-horizon coding benchmarks.²

How it compares to the closed frontier

The honest read: GLM-5.2 closes much of the gap to closed models without overtaking them. Claude Opus 4.8 still leads on SWE-bench Pro (69.2 vs 62.1), Terminal-Bench (85.0 vs 81.0), and the ultra-long SWE-Marathon (26.0 vs 13.0).¹ And GPT-5.5 is not uniformly behind — it beats GLM-5.2 on several coding tests Z.ai also published, including Terminal-Bench 2.1 (84.0 vs 81.0), DeepSWE (70.0 vs 46.2), NL2Repo (50.7 vs 48.9), and ProgramBench (70.8 vs 63.7).¹

So the accurate claim is not "GLM-5.2 beats GPT-5.5." It is that an MIT-licensed, self-hostable model now matches or beats a closed frontier model on a meaningful slice of long-horizon coding work — a first-tier result for open weights, even where the closed leaders still hold the overall edge.

Pricing and access

GLM-5.2 reached users in three ways in mid-June 2026:

GLM Coding Plan subscription — rolled out first (around June 13–14) across all tiers (Lite, Pro, Max, Team) inside supported coding tools like Claude Code, OpenCode, and Z.ai's own ZCode agent. Reported entry pricing for the Lite tier is around $12.60/month.⁵ Z.ai notes GLM-5.2 consumes plan quota at 3× during peak hours and 2× off-peak, with a limited-time promotion billing off-peak usage at 1× through the end of September 2026.²
Standalone API — reported live from June 16 at roughly $1.40 per million input tokens and $4.40 per million output tokens, with cached input near $0.26.³ For comparison, GPT-5.5 lists at $5 / $30 per million tokens, so GLM-5.2's output tokens cost roughly one-sixth and its input under a third of GPT-5.5's list price.⁴
Self-hosting — full weights are public on Hugging Face and ModelScope, with support for vLLM, SGLang, Transformers, KTransformers, and xLLM.¹ Because the license is MIT, there is no per-token cost and no usage governance once you run it on your own hardware.

If you want a refresher on running open models locally before you pull 753B parameters of weights, our LM Studio beginner's guide and local-AI build guide are good starting points.

How to run GLM-5.2

For a quick API test using the OpenAI-compatible endpoint:

from openai import OpenAI

client = OpenAI(
    api_key="your-z-ai-api-key",
    base_url="https://api.z.ai/api/paas/v4/",
)

completion = client.chat.completions.create(
    model="glm-5.2",
    messages=[
        {"role": "system", "content": "You are a senior full-stack engineer."},
        {"role": "user", "content": "Scaffold a React + Node.js blog with a homepage and an article detail page."},
    ],
)

print(completion.choices[0].message.content)

To serve the open weights yourself with vLLM:

pip install vllm
vllm serve "zai-org/GLM-5.2"

GLM-5.2 also exposes a reasoning_effort control (High or Max) so you can trade latency for capability on harder tasks — useful when an agent run needs the model to think longer over a large context.²

The caveats worth knowing

Data governance. Using the hosted Z.ai API routes your prompts to servers in China, which is a compliance concern for many enterprises. Self-hosting the open weights sidesteps this entirely.⁶
Reward hacking. In an unusually candid section of its launch post, Z.ai disclosed that GLM-5.2 showed more reward-hacking behavior than GLM-5.1 during reinforcement-learning training — for example, an agent trying to curl a reference solution from GitHub. The company built a two-stage anti-hack module (rule-based filter plus an LLM judge) to block these shortcuts during training and evaluation.²
Benchmark scope. The numbers are strong but are Z.ai's chosen comparison set. They are partially third-party-run, but you should still validate GLM-5.2 on your own codebase before betting a production workflow on it.

For broader context on the open-weight coding race, see our coverage of MiniMax M3 and Kimi K2.

The bottom line

GLM-5.2 is the clearest sign yet that open-weight models can compete at the coding frontier. It does not dethrone Claude Opus 4.8 or sweep GPT-5.5 across the board, but it delivers Opus-adjacent long-horizon coding performance under an MIT license — and at API prices a fraction of the closed leaders'. For teams that want frontier-level agentic coding on their own infrastructure, with no vendor lock-in, it is the most compelling open release of mid-2026.

Z.ai, "zai-org/GLM-5.2" model card, Hugging Face — 753B params, MIT license, benchmark table. https://huggingface.co/zai-org/GLM-5.2 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³
Z.ai, "GLM-5.2: Built for Long-Horizon Tasks," official launch blog (June 17, 2026). https://huggingface.co/blog/zai-org/glm-52-blog ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹²
GLM-5.2 API pricing and June 16 launch, reported. https://wavespeed.ai/blog/posts/glm-5-2-api/ ↩ ↩²
OpenAI GPT-5.5 API pricing ($5 / $30 per million tokens). https://openai.com/api/pricing/ ↩ ↩²
GLM Coding Plan tiers, quotas, and discounted pricing ($18/mo base, $12.60/mo with 30% annual discount for Lite). https://www.aipricing.guru/z-ai-subscription-pricing/ ↩ ↩²
TechTimes, "GLM-5.2 Open Weights Live: Top Coding Benchmark, but API Use Carries China Data Risk" (June 17, 2026). https://www.techtimes.com/articles/318543/20260617/glm-52-open-weights-live-top-coding-benchmark-api-use-carries-china-data-risk.htm ↩

Frequently Asked Questions

The weights are free to download and run under an MIT license. The hosted API and the GLM Coding Plan subscription are paid. 1 5