Groq

LPU-based inference — sub-100ms first-token latencies on open-weight models.

Groq's LPU architecture delivers some of the lowest first-token latencies in the industry across Llama, GPT-OSS, Whisper, and Qwen models. US-hosted; zero data retention is available on request via Opper Enterprise. The fast-path route for latency-sensitive applications — chat, agents, interactive tool use — where time-to-first-token is the critical metric.

1 route11 modelsUS

groq.com

Models on Groq

Every model we route through Groq. Compare residency, ZDR, training posture, and price at a glance — full data-handling detail per route below.

Model	Region	Zero data retention	Training	Context	Input	Output
GPT OSS 120B OpenAI	US	Enterprise	No	128K	$0.15	$0.60
GPT OSS 20B OpenAI	US	Enterprise	No	128K	$0.07	$0.30
Whisper Large v3 OpenAI	US	Enterprise	No	—	$0.0019 / min
Whisper Large v3 Turbo OpenAI	US	Enterprise	No	—	$0.0007 / min
GPT OSS Safeguard 20B OpenAI	US	Enterprise	No	128K	$0.07	$0.30
Llama 4 Scout 17B 16E Instruct Meta	US	Enterprise	No	128K	$0.11	$0.34
Llama 3.3 70B Meta	US	Enterprise	No	128K	$0.59	$0.79
Llama 3.1 8B Instant Meta	US	Enterprise	No	128K	$0.05	$0.08
Qwen3 32B Alibaba	US	Enterprise	No	131K	$0.29	$0.59
Kimi K2 Instruct 0905 Moonshot	US	Enterprise	No	256K	$1.00	$3.00
Kimi K2 Instruct Moonshot	US	Enterprise	No	128K	$1.00	$3.00

Data handling per route

Groq hosts on 1 route. Each route has its own privacy posture, residency, and GDPR terms. Postures are maintained by Opper with a last-verification timestamp.

United States🇺🇸

Zero data retention is available via Opper Enterprise contract. No training on customer data. US; SCCs; DPA available.

Zero data retention: Available via Opper Enterprise contract.
Training: No training on customer data.
Logging: Abuse monitoring (30-day retention)
Third-party access: Provider may share with subprocessors / partners
GDPR DPA: DPA available
Transfer mechanism: SCCs