00 · OVERVIEW

One base_url.
Every provider behind it.

Krindel is a drop-in, OpenAI-compatible gateway in front of every LLM provider you use: routing and failover, per-key budgets and rate limits, PII redaction, shared caching, and a durable async queue, all governed in one place. Adopting it means changing one line.

RuntimeSingle static binary. Go stdlib core.
SDK compatTested drop-in for the official OpenAI SDKs (Python & Node)
Offline demoFully demoable offline with the built-in mock provider
VerifiabilityEvery factual claim on this page is verifiable from the repository

01 · LIVE DEMO

Live

This is Krindel's actual Go engine, compiled to WebAssembly, running entirely in your browser. Nothing you type leaves this page.

PII mode

What the provider would see (PII mode: redact)

Loading the engine…

02 · OPERATION

Three steps. Then it's boring.

  1. Point your SDK at Krindel. Change base_url, use a Krindel-minted virtual key, and touch nothing else. This is tested, not assumed: the official, unmodified OpenAI SDKs (openai‑python 2.44.0, openai‑node 6.45.0, versions pinned in scripts/sdk_smoke.sh) pass non-streaming calls, SSE streaming, and typed error handling against a live gateway. The completion paths also run end-to-end through a real provider.
  2. Mint virtual keys. Workspace-scoped keys with per-key RPM/TPM limits and monthly budgets; raw keys are shown once and stored only as hashes.
  3. Route and govern. Priority, round-robin, cheapest, or cascade routing, with retries and circuit breakers. Budgets are enforced atomically, even across a cluster.

03 · CAPABILITIES

Everything between your apps and the providers

Routing & failover

One public model name fans out to prioritized targets across providers, with retries, backoff, and per-target circuit breakers. A dead provider is skipped mid-request.

Budgets that hold under load

Worst-case cost gets reserved atomically before a request is routed. On the clustered (Postgres) backend, a concurrent burst across multiple gateway nodes still can't collectively beat a monthly budget. Reservations orphaned by a crash self-heal inside the 15-minute lease window.

Cluster-consistent rate limits

Per-key RPM and TPM token buckets. On the clustered (Redis) backend, N nodes enforce one limit, not N of them: the admission decision executes atomically on shared state.

One-way PII redaction

Emails, phone numbers, SSNs, credit cards, and more get replaced with [REDACTED_*] before anything leaves the gateway — it's the same engine running in the demo above. Off, log, redact, or block, set per deployment. There's no un-redaction, and that's deliberate.

Durable async jobs

POST /v1/jobs runs the same gate and router as the sync path. On the clustered backend, the queue is durable: kill a node mid-traffic and its accepted jobs are reclaimed and finished by the others. We test exactly that.

Shared response cache

Exact-match, workspace-isolated caching. On the clustered (Redis) backend, a completion cached by one node is a hit on all of them. Tenants never share cache entries.

Audit trail & export

Every request is recorded: tokens, cost, latency, status, PII types, retries. Stream it out as JSONL or CSV, full history on the clustered backend. Retention windows are opt-in, for when you need to prune.

Role-based admin access

Static config identities. A viewer/operator/admin role matrix over the admin API. Fail-closed. The shared admin key remains.

Zero-dependency core

The core is pure Go standard library. The one exception is the PostgreSQL driver, confined to a single package and enforced by an architecture test that fails CI on any leak. Redis gets spoken natively too, through a RESP client we wrote ourselves in about 400 lines.

04 · MEASURED

Numbers this repository produced

Krindel's constitution forbids publishing a performance claim the repo didn't measure itself. These come from make bench, an open-loop harness that boots the full gateway in-process against an identical-stack baseline.

Δp50 149µs

added latency, sync completions at 1,000 RPS sustained, 0 errors

Δp50 151µs

the same run with PII redaction on (vs 149µs off) at 1,000 RPS

25/25

live smoke checks green on a single node, and the same suite (18/18, minus the node-local TLS section and the RBAC identity checks, which need identities provisioned on the target) green through a load balancer over two nodes

0 lost

accepted jobs after SIGKILLing a node mid-traffic in the cluster validation harness

Caveat: benchmark numbers were measured on a shared 2-vCPU VM. A reference-hardware re-run is on the roadmap before these numbers appear anywhere money changes hands.

05 · AVAILABILITY

The gateway is real. The licensing is being finalized.

Krindel is under active development, with its commercial license under counsel review. Want an early conversation, or a live demo against your own provider keys (fully offline if you prefer)? Get in touch.

ReproducibilityEverything on this page is reproducible from the repository: the demo is the shipping engine, the numbers are the benchmark harness output, the cluster claims are one script.