Getting startedGuidesReferenceChangelog
Apoxy:// Docs / Guides / Trace requests through agents

Trace requests through agents

Follow a single inbound request from ingress through sandbox through every LLM call it produced. Worked debugging session.

It is 03:14. A customer reports a wrong answer from one of your agents. You have an OTLP backend, an agent name, and a rough timestamp. This guide walks you from there to "here is exactly what we sent, what came back, and who triggered it" - in about a minute.

The shape of a single invocation

$diagramMERMAID

Every egress span carries an invocation.id attribute. It's the same ID across every span produced by one inbound request, including multiple LLM/MCP calls and any internal HTTP the agent makes through the gateway. That ID is your join key.

Attribution: how the chain stays connected

The worker stamps invocation.id (and a few other attribution fields) into a PROXY v2 TLV on every outbound dial. The EgressGateway's ext_proc reads the TLV and sets the corresponding OTLP attributes on each captured transaction. The agent process itself doesn't have to pass any header - even if it did, the worker-side TLV is the authority.

CLRK also propagates traceparent and tracestate automatically: the inbound trace context on the request received by the ingress is injected on every outbound LLM/MCP call. If the original caller was inside an existing trace, your agent's calls extend that trace.

Caller-supplied headers are not auto-promoted to span attributes. A bare X-Tenant-Id HTTP header is not delivered to the agent at all; caller context reaches the agent only when the caller sends it in CloudEvents binary form as ce-<name> (e.g. ce-x-tenant-id), which shows up in the envelope under the de-prefixed key (x-tenant-id). If you want such values on the OTLP spans, the agent has to emit its own span or write them into the response in a way you can grep on.

Worked debugging session

"Customer X reported a wrong answer from jq-bot around 03:14 UTC."

Step 1: scope the time window.

Open your OTLP backend (Honeycomb, Tempo, or your collector's UI). Filter on:

$terminalTXT
agent.name = "jq-bot" gen_ai.system = "anthropic" time between 03:13 and 03:16

Step 2: find the offending span.

Look at the spans returned. Two attributes will identify "interesting" calls quickly:

  • gen_ai.response.model - is it the model you expected, or did the request silently fall back?
  • gen_ai.usage.input_tokens - was the prompt huge? (A bug in context assembly often shows up as a 50KB prompt.)
  • clrk.budget.denied - was this call refused by the budget? See Cap LLM spend per agent.

Pick the call that matches the customer's complaint window.

Step 3: pull invocation.id.

Open the span detail. Copy invocation.id. This is the join key.

Step 4: see the full chain.

Re-query the OTLP backend:

$terminalTXT
invocation.id = "<the-id>"

You now see every egress call the same invocation produced - multiple LLM calls if the agent re-prompted, MCP calls if any tool runs, and internal HTTP calls if the agent talked to your own services through the gateway.

Step 5: walk back to the trigger.

The inbound HTTP request to the per-TaskAgent ingress is also part of the trace - traceparent propagation tied them together. Filter on agent.name plus the timestamp range and look for the request span; that's where any caller-supplied headers (like a customer ID your auth proxy injected) live.

Two ways to attach your own context

Inside the agent

Caller context reaches the agent only when the caller sends it as a CloudEvents binary-mode header ce-<name> (for example -H 'ce-x-tenant-id: acme', or via clrk agents run-task --header). In the structured-mode JSON envelope on stdin it appears under the de-prefixed key (x-tenant-id); in binary mode (GET /v1/event) it comes back as the ce-x-tenant-id response header. A bare, non-ce-* header like X-Tenant-Id is not delivered. Read it out:

$terminalPY
import json, sys env = json.load(sys.stdin) # Only present if the caller sent ce-x-tenant-id (CloudEvents binary mode). tenant = env.get("x-tenant-id", "unknown")

Then use that value in your own structured logs or in API calls downstream. CLRK does not auto-promote it to span attributes, but your code can.

Emit your own spans

If the agent runtime supports OpenTelemetry, you can emit spans that chain off the inbound traceparent. Read the envelope's traceparent attribute, configure your OTel SDK to use it as parent, and your agent-internal spans will hang off the same trace as the CLRK-emitted egress spans. Configure your SDK to ship to the same OTLP endpoint you set on the EgressGateway so everything lands in one backend.

The metadata service

Each sandbox can reach a per-sandbox HTTP endpoint at $CLRK_METADATA_URL (and $CLRK_METADATA_URL_V6 for IPv6). Two routes:

  • GET /v1/event - returns the request envelope. Binary mode by default (CloudEvents attributes as ce-* response headers + raw body). Structured mode when you send Accept: application/cloudevents+json.
  • POST /v1/response - for spec.delivery.mode: Metadata, where the agent posts its reply back here instead of writing stdout.

/v1/event is the right tool when your agent is a long-running process that needs to fetch its current invocation context without re-parsing stdin. There is no /v1/identity, /v1/info, or similar - what's available are exactly those two endpoints today.

What you can't trace today

  • Wall-clock cost of sandbox spawn/teardown. The worker doesn't emit OTel spans yet. The latency you see in OTLP starts when the first egress packet hits the gateway. Coming soon.
  • Auto-correlated agent-emitted spans. If your agent emits spans via its own OTel SDK, they correlate via traceparent propagation
    • CLRK does not inject anything automatic on the agent side beyond what arrived on the inbound request.

Where to next