Cerebras

Wafer-scale chip inference — among the fastest tokens-per-second in the industry.

Cerebras runs inference on their custom Wafer-Scale Engine, delivering some of the highest tokens-per-second throughput available on Llama and GPT-OSS class models. US-hosted; zero data retention is available on request via Opper Enterprise. The default pick when output throughput dominates the workload and US hosting is acceptable.

1 route4 modelsUS
cerebras.ai

Models on Cerebras

Every model we route through Cerebras. Compare residency, ZDR, training posture, and price at a glance — full data-handling detail per route below.

ModelRegionZero data retentionTrainingContextInputOutput
USEnterpriseNo131K$0.35$0.75
USEnterpriseNo128K$0.10$0.10
USEnterpriseNo128K$0.60$1.20
USEnterpriseNo128K$2.25$2.75

Data handling per route

Cerebras hosts on 1 route. Each route has its own privacy posture, residency, and GDPR terms. Postures are maintained by Opper with a last-verification timestamp.

United States🇺🇸

Zero data retention is available via Opper Enterprise contract. No training on customer data. US; SCCs; DPA available.

Zero data retention
Available via Opper Enterprise contract.
Training
No training on customer data.
Logging
None
Third-party access
Provider may share with subprocessors / partners
GDPR DPA
DPA available
Transfer mechanism
SCCs

Start building with 300+ models

One API key. Every major provider. Up and running in minutes.

Get startedView Documentation