GLM-5.1: China’s Z.ai Releases Open-Source LLM That Thinks for 8 Hours Straight

Is China picking back up the open-source AI baton? Z.ai, also known as Zhupai AI, a Chinese AI startup listed on the Hong Kong Stock Exchange in early 2026 with a market capitalization of 52.83 billion dollars, has unveiled GLM-5.1 under a permissive MIT License, allowing enterprises to download, customize, and use it for commercial purposes on Hugging Face.

The release represents a pivotal moment in the evolution of artificial intelligence. While competitors have focused on increasing reasoning tokens for better logic, Z.ai is optimizing for something different: productive horizons. GLM-5.1 is designed to work autonomously for up to eight hours on a single task, marking a definitive shift from vibe coding to agentic engineering.

GLM-5.1’s core technological breakthrough is not just its scale, though its 754 billion parameters and 202,752 token context window are formidable, but its ability to avoid the plateau effect seen in previous models.

Z.ai research demonstrates that GLM-5.1 operates via what they call a staircase pattern, characterized by periods of incremental tuning within a fixed strategy punctuated by structural changes that shift the performance frontier.

In one scenario from their technical report, the model was tasked with optimizing a high-performance vector database. The model was provided with a Rust skeleton and empty implementation stubs, then used tool-call-based agents to edit code, compile, test, and profile. While previous state-of-the-art results from models like Claude Opus 4.6 reached a performance ceiling of 3,547 queries per second, GLM-5.1 ran through 655 iterations and over 6,000 tool calls.

At iteration 90, the model shifted from full-corpus scanning to IVF cluster probing with f16 vector compression, jumping performance to 6,400 queries per second. By iteration 240, it autonomously introduced a two-stage pipeline, reaching 13,400 queries per second. The final result: 21,500 queries per second, roughly six times the best result achieved in a single 50-turn session.

This demonstrates a model that functions as its own research and development department, breaking complex problems down and running experiments with real precision.

The most impressive anecdotal benchmark was the Scenario 3 test: building a Linux-style desktop environment from scratch in eight hours. Unlike previous models that might produce a basic taskbar and a placeholder window before declaring the task complete, GLM-5.1 autonomously filled out a file browser, terminal, text editor, system monitor, and even functional games. It iteratively polished the styling and interaction logic until it delivered a visually consistent, functional web application.

On SWE-Bench Pro, which evaluates a model’s ability to resolve real-world GitHub issues, GLM-5.1 achieved a score of 58.4, outperforming GPT-5.4 at 57.7, Claude Opus 4.6 at 57.3, and Gemini 3.1 Pro at 54.2.

Beyond standardized coding tests, the model showed significant gains in reasoning and agentic benchmarks. It scored 63.5 on Terminal-Bench 2.0 and reached 66.5 when paired with the Claude Code harness. On CyberGym, it achieved a 68.7 score, demonstrating a nearly 20-point lead over the previous GLM-5 model.

In the reasoning domain, it scored 31.0 on Humanitys Last Exam, which jumped to 52.3 when the model was allowed to use external tools. On the AIME 2026 math competition benchmark, it reached 95.3, while scoring 86.2 on GPQA-Diamond for expert-level science reasoning.

GLM-5.1 is positioned as an engineering-grade tool rather than a consumer chatbot. The product offering is divided into three subscription tiers: Lite at 27 dollars per quarter, Pro at 81 dollars per quarter, and Max at 216 dollars per quarter.

For API usage, Z.ai has priced GLM-5.1 at 1.40 dollars per million input tokens and 4.40 dollars per million output tokens, significantly cheaper than Claude Opus 4.6’s 30 dollars per million total or GPT-5.4’s 17.50 dollars per million total.

The model is also packaged for local deployment, supporting inference frameworks including vLLM, SGLang, and xLLM. Comprehensive deployment instructions are available at the official GitHub repository, allowing developers to run the 754 billion parameter MoE model on their own infrastructure.

The licensing tells a larger story about the current state of the global AI market. GLM-5.1 has been released under the MIT License, with its model weights made publicly available on Hugging Face and ModelScope. However, GLM-5 Turbo remains proprietary and closed-source.

This reflects a growing trend among leading AI labs toward a hybrid model: using open-source models for broad distribution while keeping execution-optimized variants behind a paywall. Z.ai CEO Zhang Peng appears to be navigating this by ensuring that while the flagship’s core intelligence is open to the community, the high-speed execution infrastructure remains a revenue-driving asset.

Agents could do about 20 steps by the end of last year, wrote z.ai leader Lou on X. GLM-5.1 can do 1,700 now. Autonomous work time may be the most important curve after scaling laws.

Whether GLM-5.1 represents a genuine inflection point for open-source AI or simply a powerful but ephemeral milestone, the release demonstrates that the open-source community now has access to capabilities that were, just months ago, the exclusive province of closed frontier models.

Related Posts

Newsletter

Join the discussion Cancel reply