The two scopes
| Scope | What it protects against |
|---|---|
| Per-key | A single misbehaving key — a buggy loop or a leaked credential — overwhelming the system. |
| Per-org bucket | Load spread across many keys overwhelming upstream capacity. The org bucket is the aggregate ceiling for the whole organization. |
Per-key request limits
Per-key requests-per-minute and daily request caps scale by platform tier:| Tier | Requests per minute | Daily requests |
|---|---|---|
| Solo | 60 | 5,000 |
| Professional | 500 | 50,000 |
| Business | 2,000 | 500,000 |
| Enterprise | 5,000 | 2,000,000 |
Per-org tokens-per-minute caps
The org bucket also enforces a tokens-per-minute (TPM) ceiling on the API ladder:| API tier | Tokens per minute |
|---|---|
| Developer | 100,000 |
| Growth | 500,000 |
| Scale | 2,000,000 |
| Enterprise | Unlimited |
Handling 429
When a limit is hit, the response is a 429 with a retryable error:
Retry-After header (RFC 6585) is set to the same value. To handle 429 correctly:
Read Retry-After
Wait the number of seconds in the
Retry-After header (or error.details.retry_after_seconds) before retrying.Back off if no value is present
If neither is set, use exponential backoff with a ceiling rather than retrying immediately.
X-RateLimit-Remaining-Requests, X-RateLimit-Reset-Requests, and token-level equivalents) so you can throttle proactively before hitting a limit.
Spend cap is a separate limit
A dollar spend cap is independent of RPM/TPM — it protects the bill across a billing cycle, not against burst traffic. When the cap is reached, requests return402 AI_CREDITS_EXHAUSTED with an error.details.cycle_reset_at timestamp. Spend caps are configured in Settings → Billing.
Full reference
For response-header details and per-org aggregate RPM figures, see the AI API rate-limits reference:AI API Rate Limits
The full three-layer model, response headers, and the spend-cap behavior.