One base_url.
Every provider. Full control.
Krindle is a drop-in, OpenAI-compatible gateway in front of every LLM provider you use. Routing and failover, per-key budgets and rate limits, PII redaction, shared caching, and a durable async queue — governed in one place, adopted by changing one line.
Speaks the OpenAI wire format your SDKs already use · Fully demoable offline with the built-in mock provider · Every factual claim on this page is verifiable from the repository
Loading the engine…
— HOW IT WORKS
Three steps to a governed fleet
- Point your SDK at Krindle. Change
base_url— nothing else. The request and response shapes (including SSE streaming) are exercised end-to-end by the live smoke suite; certified unmodified-SDK drop-in is the v1.0 acceptance bar. - Mint virtual keys. Workspace-scoped keys with per-key RPM/TPM limits and monthly budgets; raw keys are shown once and stored only as hashes.
- Route and govern. Priority, round-robin, cheapest, or cascade routing with retries and circuit breakers; budgets enforced atomically — even across a cluster.
— FEATURES
Everything between your apps and the providers
Routing & failover
One public model name fans out to prioritized targets across providers, with retries, backoff, and per-target circuit breakers. Dead providers are skipped mid-request.
Budgets that hold under load
Worst-case cost is reserved atomically before a request is routed — on the clustered (Postgres) backend a concurrent burst across multiple gateway nodes cannot collectively beat a monthly budget. Crash-orphaned reservations self-heal within the 15-minute lease window.
Cluster-consistent rate limits
Per-key RPM and TPM token buckets. On the clustered (Redis) backend, N nodes enforce one limit, not N — the whole admission decision executes atomically on shared state.
One-way PII redaction
Emails, phone numbers, SSNs, credit cards and more are replaced with [REDACTED_*] before leaving the gateway — the engine in the demo above. Off, log, redact, or block per deployment. There is deliberately no un-redaction.
Durable async jobs
POST /v1/jobs runs the same gate and router as the sync path. On the clustered backend the queue is durable: kill a node mid-traffic and its accepted jobs are reclaimed and finished by the others — we test exactly that.
Shared response cache
Exact-match, workspace-isolated caching — on the clustered (Redis) backend, a completion cached by one node is a hit on all of them. Tenants never share cache entries.
Audit trail & export
Every request is recorded: tokens, cost, latency, status, PII types, retries. Stream it out as JSONL or CSV (full history on the clustered backend); opt-in retention windows if you must prune.
Zero-dependency core
The core is pure Go standard library — the sole exception is the PostgreSQL driver, confined to one package and enforced by an architecture test that fails CI on any leak. Even Redis is spoken natively via our own ~400-line RESP client.
— MEASURED, NOT BORROWED
Numbers this repository produced
Krindle's constitution forbids publishing performance claims the repo
didn't measure itself. These come from make bench — an
open-loop harness that boots the full gateway in-process against an
identical-stack baseline.
added latency, sync completions at 1,000 RPS sustained, 0 errors
the same run with PII redaction on (vs 149µs off) at 1,000 RPS
live smoke checks green on a single node — and the same suite (18/18, minus the node-local TLS section) green through a load balancer over two nodes
accepted jobs after SIGKILLing a node mid-traffic in the cluster validation harness
Caveat, stated plainly: benchmark numbers were measured on a shared 2-vCPU VM. A reference-hardware re-run is on the roadmap before these numbers appear anywhere money changes hands.
The gateway is real. The licensing is being finalized.
Krindle is under active development with its commercial license under counsel review. If you want an early conversation — or a live demo against your own provider keys, fully offline if you prefer — get in touch.
Talk to us — admin@krindle.teamEverything on this page is reproducible from the repository: the demo is the shipping engine, the numbers are the benchmark harness output, the cluster claims are one script.