Integration Guide
Krato sits between your app and LLM providers, enforcing per-user token budgets in real time.
Architecture
Your App → Krato SDK → Krato Server → LLM Provider
↕ ↕
Budget Check PostgreSQL + Redis- Before each LLM call, the SDK checks the user's budget
- If the budget allows, the call proceeds to the LLM provider
- After the call, the SDK reports token usage
- The dashboard shows real-time usage and budget status
Quick Start
1. Get your Project Key
Sign up at the Krato dashboard, go to Settings, and copy your Project Key.
2. Install the SDK
TypeScript
npm install kratoPython
pip install krato[openai]Go
go get github.com/kratosdk/krato-go3. Integrate (TypeScript)
import { Krato } from "krato"
const krato = new Krato({
projectKey: process.env.KRATO_PROJECT_KEY!,
provider: "openai",
apiKey: process.env.OPENAI_API_KEY!,
});
// Replace openai.chat.completions.create() with krato.chat()
const { result, budgetStatus, usage } = await krato.chat(
"user_123", "gpt-4o",
[{ role: "user", content: "Hello!" }],
);4. Integrate (Python)
from krato import Krato
krato = Krato(
project_key=os.environ["KRATO_PROJECT_KEY"],
provider="openai",
api_key=os.environ["OPENAI_API_KEY"],
)
response = krato.chat("user_123", "gpt-4o", messages)5. Set Budgets
In the dashboard → Users & Budgets, set Limit (soft) and Cap (elastic buffer). Or via API:
curl -X PUT http://localhost:8080/api/v1/users/user_123/budget \
-H "Authorization: Bearer krato_your_key" \
-H "Content-Type: application/json" \
-d '{"limit": 100000, "cap": 20000}'API Reference
All endpoints require Authorization: Bearer '<'project_key'>'.
Budget Check (SDK internal)
POST/api/v1/internal/check
Request
{ "user_id": "user_123",
"provider": "openai",
"model": "gpt-4o" }Response
{ "status": "normal",
"used": 45000,
"limit": 100000,
"cap": 20000 }Usage Report (SDK internal)
POST/api/v1/internal/report
{ "user_id": "user_123", "provider": "openai",
"model": "gpt-4o",
"input_tokens": 150, "output_tokens": 80 }Set Budget
PUT/api/v1/users/{userID}/budget
{ "limit": 100000, "cap": 20000 }Get Budget
GET/api/v1/users/{userID}/budget
{ "user_id": "user_123",
"limit_tokens": 100000, "cap_tokens": 20000,
"period": "none", "used": 45000 }Delete Budget
DELETE/api/v1/users/{userID}/budget
Get User Usage
GET/api/v1/users/{userID}/usage
Query params: from, to (ISO timestamps), group_by (model)
Reset User Usage
POST/api/v1/users/{userID}/usage/reset
Usage Summary
GET/api/v1/usage/summary
{ "total_tokens": 1250000,
"input_tokens": 500000,
"output_tokens": 750000,
"requests": 3420,
"top_users": [
{ "user_id": "user_123", "total_tokens": 89000, "requests": 210 }
] }Health Check
GET/health
Returns {status: ok} — no auth required.
Budget Enforcement Logic
| Condition | Status | Action |
|---|---|---|
| used < limit | normal | Allow |
| limit ≤ used < limit + cap | warning | Allow with warning |
| used ≥ limit + cap | rejected | Block request |
When rejected, the SDK throws KratoBudgetExceededError (TS/Python) or returns BudgetExceededError (Go) before calling the LLM.
Streaming
All SDKs support streaming. Budget is checked once before the stream starts — Krato never cuts off a stream mid-response.
TypeScript
const stream = await krato.chatStream("user_123", "gpt-4o", messages);
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content ?? "");
}
console.log(stream.usage); // available after stream endsPython
stream = krato.chat_stream("user_123", "gpt-4o", messages)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
print(stream.usage) # available after stream endsGo
stream, _ := client.ChatStream("user_123", "gpt-4o", messages, nil)
for {
chunk, ok := stream.Next()
if !ok { break }
fmt.Print(chunk)
}
usage, _ := stream.Usage()Error Handling
| Error | When | Behavior |
|---|---|---|
| Server unreachable | Network issue | Fail-open: LLM call proceeds |
| Budget exceeded | used ≥ limit + cap | Throws before calling LLM |
| Invalid project key | Wrong/missing key | 401 from server |
| Provider error | API key, rate limit | Error propagated as-is |