Skip to main content
HASP enforces rate limits at two scopes — per API key and per organization — plus a separate per-cycle spend cap. All applicable limits must pass for a request to proceed.

The two scopes

ScopeWhat it protects against
Per-keyA single misbehaving key — a buggy loop or a leaked credential — overwhelming the system.
Per-org bucketLoad spread across many keys overwhelming upstream capacity. The org bucket is the aggregate ceiling for the whole organization.
A per-key cap can never exceed the org bucket: the org-tier bucket is the hard upper bound, and per-key limits are carved out beneath it. Raising a per-key cap above what the org’s tier allows is a tier-gated feature, not a per-key setting.

Per-key request limits

Per-key requests-per-minute and daily request caps scale by platform tier:
TierRequests per minuteDaily requests
Solo605,000
Professional50050,000
Business2,000500,000
Enterprise5,0002,000,000
Tiers not listed fall through to the default (60 RPM / 5,000 daily).

Per-org tokens-per-minute caps

The org bucket also enforces a tokens-per-minute (TPM) ceiling on the API ladder:
API tierTokens per minute
Developer100,000
Growth500,000
Scale2,000,000
EnterpriseUnlimited
The TPM cap is the org-level throttle that per-key allocations sit beneath.

Handling 429

When a limit is hit, the response is a 429 with a retryable error:
{
  "success": false,
  "error": {
    "code": "RATE_LIMITED",
    "type": "rate_limited",
    "message": "Rate limit exceeded. Retry after 12 seconds.",
    "retryable": true,
    "details": { "retry_after_seconds": 12 }
  }
}
The Retry-After header (RFC 6585) is set to the same value. To handle 429 correctly:
1

Read Retry-After

Wait the number of seconds in the Retry-After header (or error.details.retry_after_seconds) before retrying.
2

Back off if no value is present

If neither is set, use exponential backoff with a ceiling rather than retrying immediately.
3

Respect the bucket

Persistent 429s mean you are at the org or tier ceiling — spread load over time or move up a tier rather than retrying harder.
Responses also carry rate-limit headers (X-RateLimit-Remaining-Requests, X-RateLimit-Reset-Requests, and token-level equivalents) so you can throttle proactively before hitting a limit.

Spend cap is a separate limit

A dollar spend cap is independent of RPM/TPM — it protects the bill across a billing cycle, not against burst traffic. When the cap is reached, requests return 402 AI_CREDITS_EXHAUSTED with an error.details.cycle_reset_at timestamp. Spend caps are configured in Settings → Billing.

Full reference

For response-header details and per-org aggregate RPM figures, see the AI API rate-limits reference:

AI API Rate Limits

The full three-layer model, response headers, and the spend-cap behavior.