Cerebras

Wafer-scale chip inference — among the fastest tokens-per-second in the industry.

Cerebras runs inference on their custom Wafer-Scale Engine, delivering some of the highest tokens-per-second throughput available on Llama and GPT-OSS class models. US-hosted; zero data retention is available on request via Opper Enterprise. The default pick when output throughput dominates the workload and US hosting is acceptable.

1 route4 modelsUS

cerebras.ai

Models on Cerebras

Every model we route through Cerebras. Compare residency, ZDR, training posture, and price at a glance — full data-handling detail per route below.

Model	Region	Zero data retention	Training	Context	Input	Output
GLM 4.7 Z.ai	US	Enterprise	No	128K	$2.25	$2.75
GPT OSS 120B OpenAI	US	Enterprise	No	131K	$0.35	$0.75
Llama 3.1 8B Meta	US	Enterprise	No	128K	$0.10	$0.10
Qwen 3 235B Instruct Alibaba	US	Enterprise	No	128K	$0.60	$1.20

Data handling per route

Cerebras hosts on 1 route. Each route has its own privacy posture, residency, and GDPR terms. Postures are maintained by Opper with a last-verification timestamp.

United States🇺🇸

Zero data retention is available via Opper Enterprise contract. No training on customer data. US; SCCs; DPA available.

Zero data retention: Available via Opper Enterprise contract.
Training: No training on customer data.
Logging: None
Third-party access: Provider may share with subprocessors / partners
GDPR DPA: DPA available
Transfer mechanism: SCCs