GLM 4.7 Flash

by Z.ai

GLM 4.7 Flash is the efficiency-focused variant of the GLM-4.7 series, using a lean 30-billion-parameter architecture with only 3 billion active parameters so it can run on modest consumer hardware. Released January 2026, it carries the reasoning-density improvements of the GLM-4.7 family with MoE sparsity tuned for local inference, low-latency use, and cost-constrained production. Despite its smaller footprint, Flash keeps the interleaved thinking design, supporting visible or implicit chain-of-thought reasoning in structured responses. Its 200K context window supports extended coding sessions and multi-file reasoning, making it effective for code generation, terminal automation, and agentic development on edge devices or resource-limited clusters. Tuned for rapid inference, GLM 4.7 Flash scores competitively on coding benchmarks while avoiding the compute overhead of the full GLM-4.7. Released under an MIT license, it targets developers building local AI coding assistants, edge-based agents, and production systems where latency, throughput, and hardware efficiency shape user experience and cost.

Key info

Input
Output
Features
Context window
203K
Max output
203K
Input price
$0.06 /1M
Output price
$0.40 /1M
  • US residency available
  • Zero data retention via Enterprise
  • No training by default
  • GDPR DPA available

Available routes

GLM 4.7 Flash runs on 2 different routes through the Opper gateway. Compare residency, ZDR, and training posture at a glance β€” full data-handling detail per route below.

ProviderRegionZero data retentionTrainingInputOutput
DeepInfraUSEnterpriseNo$0.06$0.40
NovitaUSEnterpriseNo$0.07$0.40

Training posture across routes: No training on prompts by default.

Data handling per route

Each route hosting GLM 4.7 Flash has its own privacy posture, residency, and GDPR terms. Postures are maintained by Opper with a last-verification timestamp.

DeepInfra β€” United StatesπŸ‡ΊπŸ‡Έ

Zero data retention is available via Opper Enterprise contract. No training on customer data. US; unknown.

Zero data retention
Available via Opper Enterprise contract.
Training
No training on customer data.
Logging
Limited debug logs
Third-party access
None disclosed
GDPR DPA
No DPA
Transfer mechanism
unknown

Novita β€” United StatesπŸ‡ΊπŸ‡Έ

Zero data retention is available via Opper Enterprise contract. No training on customer data. US; SCCs; DPA available.

Zero data retention
Available via Opper Enterprise contract.
Training
No training on customer data.
Logging
Abuse monitoring
Third-party access
None disclosed
GDPR DPA
DPA available
Transfer mechanism
SCCs

Benchmarks

Independent benchmark scores β€” composite indices for reasoning, coding, and math, plus individual eval scores where available.

Global rank#163 of 531 LLMs
TierEfficient
Output speed82 tok/s
First token0.86s
Intelligence Index22.9
Coding Index25.9
Reasoning & knowledge
GPQA Diamond
58%
Humanity's Last Exam
7%
Long-context reasoning
35%
Coding
SciCode
34%
Agentic & tool use
Terminal-Bench Hard
22%
τ²-Bench Telecom
99%
Math & instruction following
IFBench
61%

Get started

Call GLM 4.7 Flash through the Opper gateway with one API key. Let your coding agent set it up, or call it directly β€” Opper is drop-in compatible with the OpenAI, Anthropic, and Google AI SDKs.

Set it up with your agent

Copy this and paste it into your coding agent β€” Claude Code, Cursor, Codex, and more β€” and it'll wire up Opper for you.

Or call it directly

import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.OPPER_API_KEY,
baseURL: "https://api.opper.ai/v3/compat",
});
const completion = await client.chat.completions.create({
model: "deepinfra/zai-org/GLM-4.7-Flash",
messages: [{ role: "user", content: "Hello" }],
});
console.log(completion.choices[0].message.content);

Compare GLM 4.7 Flash with…

Side-by-side on privacy, EU hosting, pricing, and benchmarks.

Other models from Z.ai

Start building with 300+ models

One API key. Every major provider. Up and running in minutes.

Get startedView Documentation
GLM 4.7 Flash by Z.ai β€” pricing, benchmarks | Opper AI