How to detect anomalies in LLM API usage (spend spikes, error storms, and rogue agents)
The spaturzu team · 30 June 2026 · 8 min read
Your LLM provider dashboard shows you yesterday's totals. It will not tell you that one agent started looping overnight, that a deploy quietly doubled your error rate, or that someone swapped in a pricier model that tripled your token bill — until the invoice lands. This post breaks LLM usage anomalies into four signals you can actually detect, explains why a per-agent baseline catches what a project-wide average misses, and shows how spaturzu watches for all four automatically.
The four ways LLM usage goes wrong
Anomalies in LLM usage are not one problem. They are four, and they fail differently: two are statistical — a number moved away from normal — and two are categorical — something new showed up that was not there before.
1. Volume surge
A spike in request count. A retry loop, a runaway agent, or a sudden burst of real traffic, and one agent or one project is suddenly making far more calls than it ever has. Every one of those calls burns tokens. Spotted late, a surge is just a line on next month's bill you cannot explain.
2. Error-rate jump
Call volume looks normal, but a growing share of those calls is failing — 429s from a provider rate limit, 400s from a prompt template that broke in the last deploy, 500s from the provider itself. Failed calls still cost latency and retries, and a climbing error rate is usually the first visible symptom of a bad release.
3. A new model appears
Someone swaps gpt-4o-mini for a larger model in a config file, or a new agent ships pointing at a different default. The output might get better; the per-call cost can jump several-fold. The first time a model is used in production is a cost-and-behaviour event worth knowing about immediately.
4. A new agent starts calling
An agent ID appears on your key for the first time. Maybe a feature went live, maybe a test harness got pointed at production, maybe a key leaked and something you do not recognise is now making calls on it. Whatever the cause, the first appearance is the cheapest moment to notice.
Why your provider bill won't catch these
Provider invoices and usage dashboards aggregate. They roll everything on a key into a single number, settle on a daily or monthly cadence, and have no idea which of your agents made which call — the concept of "your agent" does not exist on their side. By the time an anomaly is large enough to move the total, it has already run. Detection has to happen on your own telemetry, close to real time, with the per-agent structure the provider never sees. That is exactly the structure cost attribution already gives you.
What good anomaly detection actually needs
Before the spaturzu specifics, here is what any honest detector has to get right:
- —A baseline. Current behaviour only means something relative to normal — compare a short current window against a rolling lookback, never against a hardcoded number.
- —Per-agent and per-project scope. A 10% rise across a whole project can hide one agent running at 20x its own normal. Each agent has to be baselined against itself, not just folded into the project average.
- —Statistical thresholds."More than the mean plus a few standard deviations" adapts to each project's traffic. A fixed "alert at 1,000 calls" is too low for one project and too high for the next.
- —A warm-up period. A day-old project has no normal yet, so the new-model and new-agent signals have to hold fire until there is enough history — otherwise everything looks new.
- —Cooldowns. One ongoing incident should not alert you every minute. Repeat alerts for the same signal and scope need to be suppressed for a while.
- —Speed. An alert an hour later is a post-mortem. Detection should fire in seconds, not wait for a nightly batch.
How spaturzu detects all four automatically
spaturzu already ties every call to the agent and run that made it. Anomaly detection runs on that same telemetry, so it inherits per-agent scope for free — no extra instrumentation, no separate pipeline.
The two statistical signals
Volume surge and error rate are evaluated by a background sweep that compares the last 10 minutes against a 24-hour baseline, split into 10-minute buckets, at both project and per-agent scope. A volume surge fires when the current count clears the baseline mean plus several standard deviations — fewer of them as you raise sensitivity from low to high. An error-rate jump fires when the current failure share clears both an absolute floor and the recent baseline by a margin. Each event records the observed value, the baseline it was judged against, and a low / med / high severity.
The two membership signals
New model and new agent are caught at ingest — the instant the first such call is recorded, with no sweep involved. A short warm-up (a project needs roughly a day of history, or a couple of hundred requests) stops a brand-new account from flagging everything it sees as new.
Reactive, not just scheduled
The sweep runs every minute as a backstop, but it does not wait for the clock. When ingest notices something worth a closer look, it nudges the worker over Postgres LISTEN/NOTIFY, and that project is swept within a second or two. The result is batch-job reliability with close-to-real-time latency.
Getting alerted
Every detected anomaly lands on the Anomalies page — signal, scope (the project or a named agent), severity, the observed-versus-baseline numbers, and when it fired. To get pushed somewhere, set a notify channel: a webhook, a Slack incoming hook, or an email address. A per-signal cooldown (60 minutes by default) keeps a single ongoing incident from flooding the channel.
A webhook delivery carries the full event as JSON:
{
"type": "anomaly",
"projectId": "b7e4c2a1-9f3d-4e8a-9c12-5a7b1d2e3f44",
"signal": "volume_surge",
"scope": "agent",
"agentId": "1a2b3c4d-5e6f-4071-8293-a4b5c6d7e8f9",
"agentName": "support-triage",
"severity": "high",
"observedValue": "412",
"baselineValue": "73.5000",
"details": { "windowMinutes": 10, "mean": 73.5, "stddev": 19.8 },
"detectedAt": "2026-06-30T14:02:11.430Z"
}Slack hooks receive a formatted message instead; email gets a labelled summary. Leave the notify target blank and events still record to the dashboard — you simply will not be paged.
Turning it on
There is nothing to install. Detection rides on the attribution data you already send, so if your calls are tagged with .withAgent("name"), per-agent detection works from day one (after the short warm-up).
- —Open the Anomalies page in the dashboard.
- —Leave the signals you want enabled — all four are on by default.
- —Pick a sensitivity. Medium is a good starting point; move to High for tighter thresholds once you trust your baseline.
- —Optionally add a notify channel, a target, and a cooldown.
Where to go next
- —New to the idea? Start with what LLM cost attribution is — the per-agent structure anomaly detection builds on.
- —Wiring it up in Node? See how to track LLM API costs per agent in Node.js.
- —The documentation covers all five providers (OpenAI, Anthropic, Bedrock, Gemini, Mistral) and how tagging works.
See which agent spent the money.
spaturzu attributes every OpenAI, Anthropic, Bedrock, Gemini, and Mistral call to the agent and run that made it — no proxy, no prompt changes. Free to start.