POST /v1/ai/chat

Native HASP chat endpoint. Streams by default. Requires the ai:chat scope.

Request

POST https://api.usehasp.com/v1/ai/chat
Authorization: Bearer hasp_key_live_...
Content-Type: application/json

Body parameters

Parameter	Type	Required	Description
`message`	string	Yes	The user message content.
`model`	string	No	Model ID. Defaults to `claude-sonnet-4-6`. See Models.
`stream`	boolean	No	Stream the response via SSE. Default `true`.
`conversation_id`	string	No	ULID of an existing conversation to continue. Omit to start a new conversation.
`store`	boolean	No	Persist the request/response content. Default `true`. Set `false` for stateless requests — audit events are still written regardless.
`system`	string	No	System prompt. Overrides the org default if set.
`max_tokens`	integer	No	Maximum output tokens.

Non-streaming response

When stream: false:

{
  "success": true,
  "data": {
    "message": {
      "id": "msg_01JQMSG000000000000000000",
      "role": "assistant",
      "content": "The HIPAA minimum necessary standard requires..."
    },
    "conversation_id": "conv_01JQCONV00000000000000000"
  },
  "meta": {
    "request_id": "req_01JQREQ000000000000000000",
    "usage": {
      "model": "claude-sonnet-4-6",
      "input_tokens": 18,
      "output_tokens": 74,
      "sonnet_equivalent_tokens": 92,
      "cost_usd": 0.000935
    }
  }
}

Streaming response (SSE)

When stream: true (default), the response is a standard text/event-stream. Each event follows this envelope:

event: <type>
id: evt_<ulid>
data: {"id":"evt_...","object":"event","type":"<type>","api_version":"2026-04-01","created":<unix>,"data":{...}}

Event sequence

A typical successful run emits events in this order:

run.created — run persisted; inference not yet started
run.started — upstream inference initiated
message.created — assistant message row created; deltas are about to begin
message.delta (repeated) — content chunks as they arrive
message.completed — message finalized
usage.updated — final token/credit roll-up
run.completed — run finished

On error, run.failed is emitted instead of run.completed (see Error events).

Event types

`run.created`

{
  "type": "run.created",
  "data": {
    "object": {
      "id": "run_01JQRUN000000000000000000",
      "conversation_id": "conv_01JQCONV00000000000000000",
      "model": "claude-sonnet-4-6",
      "status": "queued"
    }
  }
}

`message.delta`

{
  "type": "message.delta",
  "data": {
    "object": {
      "id": "msg_01JQMSG000000000000000000",
      "delta": {
        "type": "text_delta",
        "text": "The HIPAA"
      }
    }
  }
}

`run.completed`

{
  "type": "run.completed",
  "data": {
    "object": {
      "id": "run_01JQRUN000000000000000000",
      "status": "completed",
      "usage": {
        "input_tokens": 18,
        "output_tokens": 74,
        "sonnet_equivalent_tokens": 92,
        "cost_usd": 0.000935
      }
    }
  }
}

`run.failed`

Failure-shape envelope — both data.object and data.error are present:

{
  "type": "run.failed",
  "data": {
    "object": {
      "id": "run_01JQRUN000000000000000000",
      "status": "failed"
    },
    "error": {
      "code": "INFERENCE_UPSTREAM_FAILURE",
      "message": "The upstream model provider returned an error.",
      "retryable": true,
      "doc_url": "https://docs.usehasp.com/ai-api/reference/errors"
    }
  }
}

Error events

The standalone error event fires for auth/rate-limit failures that happen before a run could be created (no resource to snapshot):

event: error
data: {"type":"error","data":{"error":{"code":"RATE_LIMITED","message":"...","retryable":true}}}

Consuming the stream (JavaScript)

const response = await fetch('https://api.usehasp.com/v1/ai/chat', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer hasp_key_live_...',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({ message: 'Hello', stream: true }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop();

  for (const line of lines) {
    if (line.startsWith('data: ')) {
      const event = JSON.parse(line.slice(6));
      if (event.type === 'message.delta') {
        process.stdout.write(event.data.object.delta.text);
      }
    }
  }
}

Cancellation

Close the SSE connection to cancel. The server detects the disconnect, emits run.cancel_requested → run.cancelled, and halts the upstream call. No additional API call is needed.

Models

Model ID	Relative cost	Default access
`claude-haiku-4-5`	0.33× Sonnet	Enabled
`claude-sonnet-4-6`	1.0× (anchor)	Enabled
`claude-opus-4-7`	5.0× Sonnet	Opt-in required

Opus access must be enabled by an org admin in Settings → Billing → Model Access before requests using Opus are accepted. Requests to a disabled Opus model return 403 OPUS_NOT_ENABLED.

Error codes

Code	HTTP	Description
`INVALID_API_KEY`	401	Missing or revoked token
`BAA_REQUIRED`	402	No active BAA on the org
`AI_CREDITS_EXHAUSTED`	402	Org has exhausted its credit allotment
`MISSING_SCOPE`	403	Key lacks `ai:chat` scope
`PHI_BLOCKED`	403	Message contains PHI and `phi_mode=block` is set
`FEATURE_NOT_HIPAA_ELIGIBLE`	403	Requested AI feature is not HIPAA-eligible and is disabled
`OPUS_NOT_ENABLED`	403	Opus model not enabled for this org
`RATE_LIMITED`	429	RPM or daily limit exceeded — check `Retry-After`
`INFERENCE_UPSTREAM_FAILURE`	502	Upstream model provider error — retryable

​Request

​Body parameters

​Non-streaming response

​Streaming response (SSE)

​Event sequence

​Event types

​run.created

​message.delta

​run.completed

​run.failed

​Error events

​Consuming the stream (JavaScript)

​Cancellation

​Models

​Error codes