Groq

LPU-based inference — sub-100ms first-token latencies on open-weight models.

Groq's LPU architecture delivers some of the lowest first-token latencies in the industry across Llama, GPT-OSS, Whisper, and Qwen models. US-hosted; zero data retention is available on request via Opper Enterprise. The fast-path route for latency-sensitive applications — chat, agents, interactive tool use — where time-to-first-token is the critical metric.

1 route9 modelsUS
groq.com

Models on Groq

Every model we route through Groq. Compare residency, ZDR, training posture, and price at a glance — full data-handling detail per route below.

ModelRegionZero data retentionTrainingContextInputOutput
USEnterpriseNo128K$0.15$0.60
USEnterpriseNo128K$0.07$0.30
USEnterpriseNo128K$0.07$0.30
USEnterpriseNo128K$0.11$0.34
USEnterpriseNo128K$0.59$0.79
USEnterpriseNo128K$0.05$0.08
USEnterpriseNo131K$0.29$0.59
USEnterpriseNo256K$1.00$3.00
USEnterpriseNo128K$1.00$3.00

Data handling per route

Groq hosts on 1 route. Each route has its own privacy posture, residency, and GDPR terms. Postures are maintained by Opper with a last-verification timestamp.

United States🇺🇸

Zero data retention is available via Opper Enterprise contract. No training on customer data. US; SCCs; DPA available.

Zero data retention
Available via Opper Enterprise contract.
Training
No training on customer data.
Logging
Abuse monitoring (30-day retention)
Third-party access
Provider may share with subprocessors / partners
GDPR DPA
DPA available
Transfer mechanism
SCCs

Start building with 300+ models

One API key. Every major provider. Up and running in minutes.

Get startedView Documentation