Stop overpaying
for every LLM call.
Iudex Route scores each request, routes it to the cheapest model that can answer it, and ships per-query cost, latency, and tier telemetry to one dashboard. One OpenAI-compatible endpoint.
- Average 47.2% spend reduction
- Sub-30ms routing decision
- 5 tiers — Cache → Opus 4.7
- OpenAI, Anthropic, Google, Meta
From token to tier in 28 milliseconds.
Five-stage pipeline. No queue, no batch. Every request flows through the same path so latency stays predictable — and so does your bill.
Hybrid difficulty estimator runs in 8–24ms — token shape, embedding cluster, domain hint, and historical priors.
Allocator checks remaining monthly quota, request budget, and per-route policy — picks the floor tier safely.
One of five tiers fires against the cheapest provider that owns that capability band right now.
Optional verifier cascade — schema, contradiction, judge — escalates only when the answer fails its check.
Per-query cost, latency, tier, country, and difficulty land in your dashboard. No external observability hookup.
A router, an observability stack, and a budget guardrail — in one wire.
We do one job — picking the right model — and the dashboard exists so you can audit every pick.
Difficulty-aware tier selection
Each request is scored 0–1 and assigned a tier. Pin a tier, calibrate per route, or let auto handle it.
- Translate to PolishT1$0.00021
- Pick a SKU from this imageT2$0.00187
- Re-derive the SQL planT3$0.00412
- Plan a 7-step agent runT4$0.02140
- "ok"T0$0.00000
One-line integration
Speaks the OpenAI chat-completions protocol. Anything that talks to GPT, talks to Iudex Route.
Monthly budget caps + spike alerts
The router self-throttles toward cheaper tiers as you approach the cap. No 4 a.m. surprises.
Per-country breakdown
Origin country is logged on every query — surfaced as latency, cost, and tier maps.
- US42.0%
- DE23.0%
- IN14.0%
- BR11.0%
- JP6.0%
- NG4.0%
Where every dollar lands
Audit any window: spend share, avg cost, queue depth, and the exact model that fired.
Five tiers. Each gets the queries that actually deserve it.
2 + 2 never needs an Opus chain. A novel combinatorics proof shouldn't be answered by a 3B flash model. Iudex Route scores each request 0–1 and picks the floor that still answers it.
Every pick is logged with its tier, difficulty score, latency, and the actual upstream cost. Drill into any query from the dashboard.
Routers move bytes. Gateways count them. Iudex Route picks the model.
Most tools in the LLM-ops aisle either let you call any model, or tell you what calling cost. Iudex Route does the third thing.
Source: vendor docs and 2026 LLM-gateway roundups (Braintrust, Inworld, EdenAI). Pricing accurate as of May 2026; check vendor sites for updates.
IudexRoutecutourreasoning-modelspendby47%withoutameasurableaccuracydrop.WewerepayingforOpus4.7onqueriesaflashmodelcouldanswer—nowtherouterhandlesthatforus,andwecanactuallyseewhereeverydollarwent.”
Ship cheaper inference this afternoon.
The free tier covers most prototype work. Change one base URL, keep your existing client, watch the spend chart move.