NewTier T0 cache · 18× cheaper repeat queries

Stop overpaying
for every LLM call.

Iudex Route scores each request, routes it to the cheapest model that can answer it, and ships per-query cost, latency, and tier telemetry to one dashboard. One endpoint for any AI client.

Average 47.2% spend reduction
Sub-30ms routing decision
5 tiers — Cache → Opus 4.7
OpenAI · Anthropic · Google · Meta · xAI

Start free — 1 month Read docsNo card · Cancel anytime

ARDFHKPM

1,924 teams routing through Iudex Route this week

POST api.acbm.ai/v1/chat/completions

live

$ curl api.acbm.ai/v1/chat/completions

-H "Authorization: Bearer acbm_••••"

-d '{ "model": "auto", "messages": [

{ "role": "user", "content": "Format this address into JSON" }

]}'

200 OK·184msrouting decision · 28ms

T1·gpt-4.1-nanoopenai$0.00018

difficulty0.12

vs flat GPT-5:−$0.0397

47.2%

Avg cost reduction

observed across 1,924 teams

28ms

Routing decision

p95 ahead of upstream call

Tiers

cache → Opus 4.7 / Deep Think

Universal AI endpoint

drop-in for any AI client

Routes traffic to

OpenAIAnthropicGoogle AIMetaMistralCoherexAITogetherDeepSeekGroqOpenAIAnthropicGoogle AIMetaMistralCoherexAITogetherDeepSeekGroq

● The hot path

From token to tier in 28 milliseconds.

Five-stage pipeline. No queue, no batch. Every request flows through the same path so latency stays predictable — and so does your bill.

Score

Hybrid difficulty estimator runs in 8–24ms — token shape, embedding cluster, domain hint, and historical priors.

Budget

Allocator checks remaining monthly quota, request budget, and per-route policy — picks the floor tier safely.

Route

One of five tiers fires against the cheapest provider that owns that capability band right now.

Verify

Optional verifier cascade — schema, contradiction, judge — escalates only when the answer fails its check.

Log

Per-query cost, latency, tier, country, and difficulty land in your dashboard. No external observability hookup.

What you get

A router, an observability stack, and a budget guardrail — in one wire.

We do one job — picking the right model — and the dashboard exists so you can audit every pick.

Routing

Difficulty-aware tier selection

Each request is scored 0–1 and assigned a tier. Pin a tier, calibrate per route, or let auto handle it.

prompttiercost

Translate to PolishT1$0.00021
Pick a SKU from this imageT2$0.00187
Re-derive the SQL planT3$0.00412
Plan a 7-step agent runT4$0.02140
"ok"T0$0.00000

Drop-in

One-line integration

Works with any AI client — OpenAI, Anthropic, Google, LangChain, plain curl. If it talks to an LLM, it talks to Iudex Route.

→integrate

▍

Python

Node

curl

Guardrail

Monthly budget caps + spike alerts

The router self-throttles toward cheaper tiers as you approach the cap. No 4 a.m. surprises.

Monthly budget

$68.00 / $200.00

Forecast hits cap on day 23

Telemetry

Per-country breakdown

Origin country is logged on every query — surfaced as latency, cost, and tier maps.

US42.0%
DE23.0%
IN14.0%
BR11.0%
JP6.0%
NG4.0%

Per-tier

Where every dollar lands

Audit any window: spend share, avg cost, queue depth, and the exact model that fired.

T008%

$0.00

cache

T141%

$0.02

haiku · gemini-flash-lite

T227%

$0.18

gpt-4.1-mini · gemini-flash

T318%

$0.91

sonnet · gpt-5 · gemini-pro

T406%

$4.17

opus · deep-think · gpt-5-pro

T008%

$0.00

cache

T141%

$0.02

haiku · gemini-flash-lite

T227%

$0.18

gpt-4.1-mini · gemini-flash

T318%

$0.91

sonnet · gpt-5 · gemini-pro

T406%

$4.17

opus · deep-think · gpt-5-pro

Row-level isolated (RLS)Export CSV / Parquet on Pro

The ladder

Five tiers. Each gets the queries that actually deserve it.

2 + 2 never needs an Opus chain. A novel combinatorics proof shouldn't be answered by a 3B flash model. Iudex Route scores each request 0–1 and picks the floor that still answers it.

TierModelsBest forCost / query

Trivial repeats

Cache

Exact-match · templated responses

~$0

Easy

Haiku 4.5 · Gemini Flash-Lite · GPT-4.1 nano

Simple factual · classification · formatting

$0.01–0.05

Moderate

GPT-4.1 mini · Gemini 2.5 Flash

Multi-step reasoning · synthesis

$0.05–0.30

Hard

Sonnet 4.6 · GPT-5 · Gemini 2.5 Pro

Complex reasoning · expert-level

$0.30–2.00

Extreme

Opus 4.7 · GPT-5 Pro · Gemini Deep Think

Novel proofs · research · agentic

$2.00–15.00

Every pick is logged with its tier, difficulty score, latency, and the actual upstream cost. Drill into any query from the dashboard.

Where we sit

Routers move bytes. Gateways count them. Iudex Route picks the model.

Most tools in the LLM-ops aisle either let you call any model, or tell you what calling cost. Iudex Route does the third thing.

Capabilityiudex routeOpenRouterHeliconePortkey

Difficulty-aware tier routing

Drop-in for any AI client

Per-query cost telemetry

Budget cap + auto-downgrade

Verifier cascade

Per-region (country) breakdown

Free trial (no card)1 monthcredits10k req10k logs

Paid plan entry$20/moPAYG$79/mo$49/mo

Source: vendor docs and 2026 LLM-gateway roundups (Braintrust, Inworld, EdenAI). Pricing accurate as of May 2026; check vendor sites for updates.

IudexRoutecutourreasoning-modelspendby47%withoutameasurableaccuracydrop.WewerepayingforOpus4.7onqueriesaflashmodelcouldanswer—nowtherouterhandlesthatforus,andwecanactuallyseewhereeverydollarwent.”

47.2% avg savings

Ship cheaper inference this afternoon.

The free tier covers most prototype work. Change one base URL, keep your existing client, watch the spend chart move.

Start free — 1 month Read docs

# pip install openai

client = OpenAI(

base_url="https://api.acbm.ai/v1",

api_key="acbm_…",

)

# that's it.

Stop overpayingfor every LLM call.