Skip to main content
Native HASP chat endpoint. Streams by default. Requires the ai:chat scope.

Request

POST https://api.usehasp.com/v1/ai/chat
Authorization: Bearer hasp_key_live_...
Content-Type: application/json

Body parameters

ParameterTypeRequiredDescription
messagestringYesThe user message content.
modelstringNoModel ID. Defaults to claude-sonnet-4-6. See Models.
streambooleanNoStream the response via SSE. Default true.
conversation_idstringNoULID of an existing conversation to continue. Omit to start a new conversation.
storebooleanNoPersist the request/response content. Default true. Set false for stateless requests — audit events are still written regardless.
systemstringNoSystem prompt. Overrides the org default if set.
max_tokensintegerNoMaximum output tokens.

Non-streaming response

When stream: false:
{
  "success": true,
  "data": {
    "message": {
      "id": "msg_01JQMSG000000000000000000",
      "role": "assistant",
      "content": "The HIPAA minimum necessary standard requires..."
    },
    "conversation_id": "conv_01JQCONV00000000000000000"
  },
  "meta": {
    "request_id": "req_01JQREQ000000000000000000",
    "usage": {
      "model": "claude-sonnet-4-6",
      "input_tokens": 18,
      "output_tokens": 74,
      "sonnet_equivalent_tokens": 92,
      "cost_usd": 0.000935
    }
  }
}

Streaming response (SSE)

When stream: true (default), the response is a standard text/event-stream. Each event follows this envelope:
event: <type>
id: evt_<ulid>
data: {"id":"evt_...","object":"event","type":"<type>","api_version":"2026-04-01","created":<unix>,"data":{...}}

Event sequence

A typical successful run emits events in this order:
  1. run.created — run persisted; inference not yet started
  2. run.started — upstream inference initiated
  3. message.created — assistant message row created; deltas are about to begin
  4. message.delta (repeated) — content chunks as they arrive
  5. message.completed — message finalized
  6. usage.updated — final token/credit roll-up
  7. run.completed — run finished
On error, run.failed is emitted instead of run.completed (see Error events).

Event types

run.created

{
  "type": "run.created",
  "data": {
    "object": {
      "id": "run_01JQRUN000000000000000000",
      "conversation_id": "conv_01JQCONV00000000000000000",
      "model": "claude-sonnet-4-6",
      "status": "queued"
    }
  }
}

message.delta

{
  "type": "message.delta",
  "data": {
    "object": {
      "id": "msg_01JQMSG000000000000000000",
      "delta": {
        "type": "text_delta",
        "text": "The HIPAA"
      }
    }
  }
}

run.completed

{
  "type": "run.completed",
  "data": {
    "object": {
      "id": "run_01JQRUN000000000000000000",
      "status": "completed",
      "usage": {
        "input_tokens": 18,
        "output_tokens": 74,
        "sonnet_equivalent_tokens": 92,
        "cost_usd": 0.000935
      }
    }
  }
}

run.failed

Failure-shape envelope — both data.object and data.error are present:
{
  "type": "run.failed",
  "data": {
    "object": {
      "id": "run_01JQRUN000000000000000000",
      "status": "failed"
    },
    "error": {
      "code": "INFERENCE_UPSTREAM_FAILURE",
      "message": "The upstream model provider returned an error.",
      "retryable": true,
      "doc_url": "https://docs.usehasp.com/ai-api/reference/errors"
    }
  }
}

Error events

The standalone error event fires for auth/rate-limit failures that happen before a run could be created (no resource to snapshot):
event: error
data: {"type":"error","data":{"error":{"code":"RATE_LIMITED","message":"...","retryable":true}}}

Consuming the stream (JavaScript)

const response = await fetch('https://api.usehasp.com/v1/ai/chat', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer hasp_key_live_...',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ message: 'Hello', stream: true }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop();

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.slice(6));
      if (event.type === 'message.delta') {
        process.stdout.write(event.data.object.delta.text);
      }
    }
  }
}

Cancellation

Close the SSE connection to cancel. The server detects the disconnect, emits run.cancel_requestedrun.cancelled, and halts the upstream call. No additional API call is needed.

Models

Model IDRelative costDefault access
claude-haiku-4-50.33× SonnetEnabled
claude-sonnet-4-61.0× (anchor)Enabled
claude-opus-4-75.0× SonnetOpt-in required
Opus access must be enabled by an org admin in Settings → Billing → Model Access before requests using Opus are accepted. Requests to a disabled Opus model return 403 OPUS_NOT_ENABLED.

Error codes

CodeHTTPDescription
INVALID_API_KEY401Missing or revoked token
BAA_REQUIRED402No active BAA on the org
AI_CREDITS_EXHAUSTED402Org has exhausted its credit allotment
MISSING_SCOPE403Key lacks ai:chat scope
PHI_BLOCKED403Message contains PHI and phi_mode=block is set
FEATURE_NOT_HIPAA_ELIGIBLE403Requested AI feature is not HIPAA-eligible and is disabled
OPUS_NOT_ENABLED403Opus model not enabled for this org
RATE_LIMITED429RPM or daily limit exceeded — check Retry-After
INFERENCE_UPSTREAM_FAILURE502Upstream model provider error — retryable