● Single static binary · Go standard library core

One base_url.
Every provider. Full control.

Krindle is a drop-in, OpenAI-compatible gateway in front of every LLM provider you use. Routing and failover, per-key budgets and rate limits, PII redaction, shared caching, and a durable async queue — governed in one place, adopted by changing one line.

Speaks the OpenAI wire format your SDKs already use · Fully demoable offline with the built-in mock provider · Every factual claim on this page is verifiable from the repository

●●● live — this is Krindle's actual Go engine compiled to WebAssembly, running entirely in your browser. Nothing you type leaves this page.

Loading the engine…

— HOW IT WORKS

Three steps to a governed fleet

  1. Point your SDK at Krindle. Change base_url — nothing else. The request and response shapes (including SSE streaming) are exercised end-to-end by the live smoke suite; certified unmodified-SDK drop-in is the v1.0 acceptance bar.
  2. Mint virtual keys. Workspace-scoped keys with per-key RPM/TPM limits and monthly budgets; raw keys are shown once and stored only as hashes.
  3. Route and govern. Priority, round-robin, cheapest, or cascade routing with retries and circuit breakers; budgets enforced atomically — even across a cluster.

— FEATURES

Everything between your apps and the providers

Routing & failover

One public model name fans out to prioritized targets across providers, with retries, backoff, and per-target circuit breakers. Dead providers are skipped mid-request.

Budgets that hold under load

Worst-case cost is reserved atomically before a request is routed — on the clustered (Postgres) backend a concurrent burst across multiple gateway nodes cannot collectively beat a monthly budget. Crash-orphaned reservations self-heal within the 15-minute lease window.

Cluster-consistent rate limits

Per-key RPM and TPM token buckets. On the clustered (Redis) backend, N nodes enforce one limit, not N — the whole admission decision executes atomically on shared state.

One-way PII redaction

Emails, phone numbers, SSNs, credit cards and more are replaced with [REDACTED_*] before leaving the gateway — the engine in the demo above. Off, log, redact, or block per deployment. There is deliberately no un-redaction.

Durable async jobs

POST /v1/jobs runs the same gate and router as the sync path. On the clustered backend the queue is durable: kill a node mid-traffic and its accepted jobs are reclaimed and finished by the others — we test exactly that.

Shared response cache

Exact-match, workspace-isolated caching — on the clustered (Redis) backend, a completion cached by one node is a hit on all of them. Tenants never share cache entries.

Audit trail & export

Every request is recorded: tokens, cost, latency, status, PII types, retries. Stream it out as JSONL or CSV (full history on the clustered backend); opt-in retention windows if you must prune.

Zero-dependency core

The core is pure Go standard library — the sole exception is the PostgreSQL driver, confined to one package and enforced by an architecture test that fails CI on any leak. Even Redis is spoken natively via our own ~400-line RESP client.

— MEASURED, NOT BORROWED

Numbers this repository produced

Krindle's constitution forbids publishing performance claims the repo didn't measure itself. These come from make bench — an open-loop harness that boots the full gateway in-process against an identical-stack baseline.

Δp50 149µs

added latency, sync completions at 1,000 RPS sustained, 0 errors

Δp50 151µs

the same run with PII redaction on (vs 149µs off) at 1,000 RPS

21/21

live smoke checks green on a single node — and the same suite (18/18, minus the node-local TLS section) green through a load balancer over two nodes

0 lost

accepted jobs after SIGKILLing a node mid-traffic in the cluster validation harness

Caveat, stated plainly: benchmark numbers were measured on a shared 2-vCPU VM. A reference-hardware re-run is on the roadmap before these numbers appear anywhere money changes hands.

The gateway is real. The licensing is being finalized.

Krindle is under active development with its commercial license under counsel review. If you want an early conversation — or a live demo against your own provider keys, fully offline if you prefer — get in touch.

Talk to us — admin@krindle.team

Everything on this page is reproducible from the repository: the demo is the shipping engine, the numbers are the benchmark harness output, the cluster claims are one script.