Integration Guide

Krato sits between your app and LLM providers, enforcing per-user token budgets in real time.

Architecture

Your App  →  Krato SDK  →  Krato Server  →  LLM Provider
                  ↕               ↕
            Budget Check     PostgreSQL + Redis
  1. Before each LLM call, the SDK checks the user's budget
  2. If the budget allows, the call proceeds to the LLM provider
  3. After the call, the SDK reports token usage
  4. The dashboard shows real-time usage and budget status

Quick Start

1. Get your Project Key

Sign up at the Krato dashboard, go to Settings, and copy your Project Key.

2. Install the SDK

TypeScript

npm install krato

Python

pip install krato[openai]

Go

go get github.com/kratosdk/krato-go

3. Integrate (TypeScript)

import { Krato } from "krato"

const krato = new Krato({
  projectKey: process.env.KRATO_PROJECT_KEY!,
  provider: "openai",
  apiKey: process.env.OPENAI_API_KEY!,
});

// Replace openai.chat.completions.create() with krato.chat()
const { result, budgetStatus, usage } = await krato.chat(
  "user_123", "gpt-4o",
  [{ role: "user", content: "Hello!" }],
);

4. Integrate (Python)

from krato import Krato

krato = Krato(
  project_key=os.environ["KRATO_PROJECT_KEY"],
  provider="openai",
  api_key=os.environ["OPENAI_API_KEY"],
)

response = krato.chat("user_123", "gpt-4o", messages)

5. Set Budgets

In the dashboard → Users & Budgets, set Limit (soft) and Cap (elastic buffer). Or via API:

curl -X PUT http://localhost:8080/api/v1/users/user_123/budget \
  -H "Authorization: Bearer krato_your_key" \
  -H "Content-Type: application/json" \
  -d '{"limit": 100000, "cap": 20000}'

API Reference

All endpoints require Authorization: Bearer '<'project_key'>'.

Budget Check (SDK internal)

POST/api/v1/internal/check

Request

{ "user_id": "user_123",
  "provider": "openai",
  "model": "gpt-4o" }

Response

{ "status": "normal",
  "used": 45000,
  "limit": 100000,
  "cap": 20000 }

Usage Report (SDK internal)

POST/api/v1/internal/report
{ "user_id": "user_123", "provider": "openai",
  "model": "gpt-4o",
  "input_tokens": 150, "output_tokens": 80 }

Set Budget

PUT/api/v1/users/{userID}/budget
{ "limit": 100000, "cap": 20000 }

Get Budget

GET/api/v1/users/{userID}/budget
{ "user_id": "user_123",
  "limit_tokens": 100000, "cap_tokens": 20000,
  "period": "none", "used": 45000 }

Delete Budget

DELETE/api/v1/users/{userID}/budget

Get User Usage

GET/api/v1/users/{userID}/usage

Query params: from, to (ISO timestamps), group_by (model)

Reset User Usage

POST/api/v1/users/{userID}/usage/reset

Usage Summary

GET/api/v1/usage/summary
{ "total_tokens": 1250000,
  "input_tokens": 500000,
  "output_tokens": 750000,
  "requests": 3420,
  "top_users": [
    { "user_id": "user_123", "total_tokens": 89000, "requests": 210 }
  ] }

Health Check

GET/health

Returns {status: ok} — no auth required.

Budget Enforcement Logic

ConditionStatusAction
used < limitnormalAllow
limit ≤ used < limit + capwarningAllow with warning
used ≥ limit + caprejectedBlock request

When rejected, the SDK throws KratoBudgetExceededError (TS/Python) or returns BudgetExceededError (Go) before calling the LLM.

Streaming

All SDKs support streaming. Budget is checked once before the stream starts — Krato never cuts off a stream mid-response.

TypeScript

const stream = await krato.chatStream("user_123", "gpt-4o", messages);
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
console.log(stream.usage); // available after stream ends

Python

stream = krato.chat_stream("user_123", "gpt-4o", messages)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
print(stream.usage) # available after stream ends

Go

stream, _ := client.ChatStream("user_123", "gpt-4o", messages, nil)
for {
    chunk, ok := stream.Next()
    if !ok { break }
    fmt.Print(chunk)
}
usage, _ := stream.Usage()

Error Handling

ErrorWhenBehavior
Server unreachableNetwork issueFail-open: LLM call proceeds
Budget exceededused ≥ limit + capThrows before calling LLM
Invalid project keyWrong/missing key401 from server
Provider errorAPI key, rate limitError propagated as-is

SDK Repositories