00 · OVERVIEW

One base_url.
Every provider behind it.

Krindel is a drop-in, OpenAI-compatible gateway in front of every LLM provider you use: routing and failover, per-key budgets and rate limits, PII redaction, shared caching, and a durable async queue, all governed in one place. Adopting it means changing one line.

Try it below How it works

RuntimeSingle static binary. Go stdlib core.

SDK compatTested drop-in for the official OpenAI SDKs (Python & Node)

Offline demoFully demoable offline with the built-in mock provider

VerifiabilityEvery factual claim on this page is verifiable from the repository

01 · LIVE DEMO

PII mode (gateway config: pii.mode)

Off Log Redact Block

max_tokens (request field, optional)

Model (drives the readout below)

PII detections

—

Prompt tokens (estimated)

—

Worst-case reserved output

—

Selected model — estimated cost

—

What the provider would see (PII mode: redact)

—

Estimated cost per route target (illustrative public list prices as of mid-2026 — the gateway reads prices from your config)

Loading the engine…

02 · OPERATION

Three steps. Then it's boring.

Point your SDK at Krindel. Change base_url, use a Krindel-minted virtual key, and touch nothing else. This is tested, not assumed: the official, unmodified OpenAI SDKs (openai‑python 2.44.0, openai‑node 6.45.0, versions pinned in scripts/sdk_smoke.sh) pass non-streaming calls, SSE streaming, and typed error handling against a live gateway. The completion paths also run end-to-end through a real provider.
Mint virtual keys. Workspace-scoped keys with per-key RPM/TPM limits and monthly budgets; raw keys are shown once and stored only as hashes.
Route and govern. Priority, round-robin, cheapest, or cascade routing, with retries and circuit breakers. Budgets are enforced atomically, even across a cluster.

03 · CAPABILITIES

Everything between your apps and the providers

Routing & failover

One public model name fans out to prioritized targets across providers, with retries, backoff, and per-target circuit breakers. A dead provider is skipped mid-request.

Budgets that hold under load

Worst-case cost gets reserved atomically before a request is routed. On the clustered (Postgres) backend, a concurrent burst across multiple gateway nodes still can't collectively beat a monthly budget. Reservations orphaned by a crash self-heal inside the 15-minute lease window.

Cluster-consistent rate limits

Per-key RPM and TPM token buckets. On the clustered (Redis) backend, N nodes enforce one limit, not N of them: the admission decision executes atomically on shared state.

One-way PII redaction

Emails, phone numbers, SSNs, credit cards, and more get replaced with [REDACTED_*] before anything leaves the gateway — it's the same engine running in the demo above. Off, log, redact, or block, set per deployment. There's no un-redaction, and that's deliberate.

Durable async jobs

POST /v1/jobs runs the same gate and router as the sync path. On the clustered backend, the queue is durable: kill a node mid-traffic and its accepted jobs are reclaimed and finished by the others. We test exactly that.

Shared response cache

Exact-match, workspace-isolated caching. On the clustered (Redis) backend, a completion cached by one node is a hit on all of them. Tenants never share cache entries.

Audit trail & export

Every request is recorded: tokens, cost, latency, status, PII types, retries. Stream it out as JSONL or CSV, full history on the clustered backend. Retention windows are opt-in, for when you need to prune.

Role-based admin access

Static config identities. A viewer/operator/admin role matrix over the admin API. Fail-closed. The shared admin key remains.

Zero-dependency core

The core is pure Go standard library. The one exception is the PostgreSQL driver, confined to a single package and enforced by an architecture test that fails CI on any leak. Redis gets spoken natively too, through a RESP client we wrote ourselves in about 400 lines.

04 · MEASURED

Numbers this repository produced

Krindel's constitution forbids publishing a performance claim the repo didn't measure itself. These come from make bench, an open-loop harness that boots the full gateway in-process against an identical-stack baseline.

Δp50 149µs

added latency, sync completions at 1,000 RPS sustained, 0 errors

Δp50 151µs

the same run with PII redaction on (vs 149µs off) at 1,000 RPS

25/25

live smoke checks green on a single node, and the same suite (18/18, minus the node-local TLS section and the RBAC identity checks, which need identities provisioned on the target) green through a load balancer over two nodes

0 lost

accepted jobs after SIGKILLing a node mid-traffic in the cluster validation harness

Caveat: benchmark numbers were measured on a shared 2-vCPU VM. A reference-hardware re-run is on the roadmap before these numbers appear anywhere money changes hands.

05 · AVAILABILITY

The gateway is real. The licensing is being finalized.

Krindel is under active development, with its commercial license under counsel review. Want an early conversation, or a live demo against your own provider keys (fully offline if you prefer)? Get in touch.

Get in touch

Emailadmin@krindle.team

ReproducibilityEverything on this page is reproducible from the repository: the demo is the shipping engine, the numbers are the benchmark harness output, the cluster claims are one script.

One base_url.Every provider behind it.