spaturzuSign in
Blog

What is LLM cost attribution (and why your provider bill can't do it)

The spaturzu team · 3 June 2026 · 7 min read

Your LLM provider sends you one invoice. It says you spent $4,200 last month. It does not say that most of it came from a single nightly job that retried itself into oblivion. LLM cost attributionis how you close that gap — tying each API call's cost back to the agent, run, and feature that caused it.

What LLM cost attribution is

Cost attribution is the practice of assigning each LLM API call's cost to the thing in your system that caused it, instead of seeing only a provider-level total. It usually works on three levels:

  • Per call — the model, provider, token counts, and computed cost of one request.
  • Per agent — which logical agent made the call (your triage bot, your summariser, your nightly report job), so cost rolls up by the unit you actually reason about.
  • Per run — which unit of work the call belonged to, so a single expensive task can be traced to the individual provider calls inside it.

Why your provider dashboard can't do it

Providers bill per API key and per organisation. Their dashboards can show spend by key, by model, or by day — but never by agent or by run, because those concepts only exist inside your application. Your provider never knew your support-triage agent existed; all it saw was a stream of requests on one key. The context that makes a bill explainable — which agent, which run, which customer — lives at the call site in your code, and it has to be captured there.

What gets attributed

A good attribution layer records, for every call, the dimensions you need to answer "where did the money go?":

  • The agent name and, for nested work, the agent path (parent → child).
  • The run identifier — the unit of work the call belonged to.
  • The project and any tags you set (environment, team, version, region).
  • The model and provider, token counts, computed cost, latency, and status.

Two ways to capture it: proxy vs in-process

There are two common architectures, and the difference matters for both latency and privacy.

A proxy or gatewayroutes your calls through a service that sits in the request path. It sees everything — including your prompts and the model's responses — which makes it powerful for debugging, but it adds a network hop and a third party that holds your content.

An in-process SDKwraps your provider client instead. It measures each call at the call site inside your own process, then sends only the metadata — token counts and cost — to the attribution backend. Your keys still call the provider directly, so there's no extra hop, and the prompt text never leaves your servers. That's the approach spaturzu takes.

Attribution doesn't require sending your prompts anywhere

This is the part teams in regulated industries miss: you do not need to ship prompt content to a third party to attribute cost. Token counts and a price are enough. With an in-process SDK the text is tokenised locally and only the counts and cost are sent — so you get full per-agent cost visibility without your prompts, system prompts, or responses ever touching the tool's backend.

What good attribution lets you do

  • See your most expensive agent the moment it becomes your most expensive agent.
  • Trace a costly run down to the individual calls that made it up.
  • Set daily or monthly budget caps per project or per agent.
  • Get an alert — or a hard stop — before an overage, not after the bill.
  • Understand the unit economics of a feature or customer.

How to add it: swap one import

With a drop-in SDK, adding attribution is a one-line change at the call site. You swap the provider import for the instrumented one and tag the call with the agent that made it:

agent.ts
import OpenAI from "@spaturzu/sdk/openai";

const openai = new OpenAI();

// Tag the call with the agent that made it — one line, no wrapper.
await openai.withAgent("support-triage").chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Summarise this ticket" }],
});

Within seconds of the first call, the agent, its run, and the cost show up in your dashboard. For a full walkthrough — environment variables, grouping calls into runs, and a hard budget cap — see How to track LLM API costs per agent in Node.js or the documentation.

Attribution vs observability vs evals

Cost attribution is a distinct job from two adjacent ones. LLM observability tools focus on reading what the model actually said; evaluationplatforms focus on whether the answers are correct. Attribution answers a third question — which agent spent the money, and how do we keep it under a cap. They're complementary; for how these overlap in practice, see spaturzu vs Helicone and spaturzu vs Langfuse.

See which agent spent the money.

spaturzu attributes every OpenAI, Anthropic, Bedrock, Gemini, and Mistral call to the agent and run that made it — no proxy, no prompt changes. Free to start.

← All posts