Three-layer model
| Layer | Scope | Purpose |
|---|---|---|
| Per-key RPM/daily | Per API key | Protect the system from a single misbehaving key (buggy loop, leaked credential) |
| Per-org aggregate RPM | Per org, across all keys | Prevent load-spreading across many keys from overwhelming upstream capacity |
| Dollar spend cap | Per org, per billing cycle | Protect the customer’s bill from runaway cost |
Per-key limits
| Limiter | Developer | Growth | Scale | Enterprise |
|---|---|---|---|---|
Requests per minute (ai:chat, ai:messages) | 60 RPM | 500 RPM | 2,000 RPM | Custom |
| Daily requests | 5,000 | 50,000 | 500,000 | Custom |
Per-org aggregate limits
| Limiter | Developer | Growth | Scale | Enterprise |
|---|---|---|---|---|
| Org RPM (all keys combined) | 180 | 2,500 | 10,000 | Custom |
Response headers
Every AI response includes rate limit headers:429 response
When a limit is hit, the response is:Retry-After header (RFC 6585) is also set to the same value.
Spend cap (budget)
A dollar spend cap is a separate limit from RPM — it protects against overspending across a billing cycle, not against burst traffic. When the cap is reached, requests return402 AI_CREDITS_EXHAUSTED: