Trace requests through agents
Follow a single inbound request from ingress through sandbox through every LLM call it produced. Worked debugging session.
It is 03:14. A customer reports a wrong answer from one of your agents. You have an OTLP backend, an agent name, and a rough timestamp. This guide walks you from there to "here is exactly what we sent, what came back, and who triggered it" - in about a minute.
The shape of a single invocation
Every egress span carries an invocation.id attribute. It's the same
ID across every span produced by one inbound request, including
multiple LLM/MCP calls and any internal HTTP the agent makes through
the gateway. That ID is your join key.
Attribution: how the chain stays connected
The worker stamps invocation.id (and a few other attribution
fields) into a PROXY v2 TLV on every outbound dial. The
EgressGateway's ext_proc reads the TLV and sets the corresponding
OTLP attributes on each captured transaction. The agent process
itself doesn't have to pass any header - even if it did, the
worker-side TLV is the authority.
CLRK also propagates traceparent and tracestate automatically: the
inbound trace context on the request received by the ingress is
injected on every outbound LLM/MCP call. If the original caller was
inside an existing trace, your agent's calls extend that trace.
Caller-supplied headers are not auto-promoted to span attributes.
A bare X-Tenant-Id HTTP header is not delivered to the agent at all;
caller context reaches the agent only when the caller sends it in
CloudEvents binary form as ce-<name> (e.g. ce-x-tenant-id), which
shows up in the envelope under the de-prefixed key (x-tenant-id). If
you want such values on the OTLP spans, the agent has to emit its own
span or write them into the response in a way you can grep on.
Worked debugging session
"Customer X reported a wrong answer from
jq-botaround 03:14 UTC."
Step 1: scope the time window.
Open your OTLP backend (Honeycomb, Tempo, or your collector's UI). Filter on:
agent.name = "jq-bot"
gen_ai.system = "anthropic"
time between 03:13 and 03:16Step 2: find the offending span.
Look at the spans returned. Two attributes will identify "interesting" calls quickly:
gen_ai.response.model- is it the model you expected, or did the request silently fall back?gen_ai.usage.input_tokens- was the prompt huge? (A bug in context assembly often shows up as a 50KB prompt.)clrk.budget.denied- was this call refused by the budget? See Cap LLM spend per agent.
Pick the call that matches the customer's complaint window.
Step 3: pull invocation.id.
Open the span detail. Copy invocation.id. This is the join key.
Step 4: see the full chain.
Re-query the OTLP backend:
invocation.id = "<the-id>"You now see every egress call the same invocation produced - multiple LLM calls if the agent re-prompted, MCP calls if any tool runs, and internal HTTP calls if the agent talked to your own services through the gateway.
Step 5: walk back to the trigger.
The inbound HTTP request to the per-TaskAgent ingress is also part
of the trace - traceparent propagation tied them together. Filter
on agent.name plus the timestamp range and look for the request
span; that's where any caller-supplied headers (like a customer ID
your auth proxy injected) live.
Two ways to attach your own context
Inside the agent
Caller context reaches the agent only when the caller sends it as a
CloudEvents binary-mode header ce-<name> (for example
-H 'ce-x-tenant-id: acme', or via clrk agents run-task --header).
In the structured-mode JSON envelope on stdin it appears under the
de-prefixed key (x-tenant-id); in binary mode (GET /v1/event) it
comes back as the ce-x-tenant-id response header. A bare,
non-ce-* header like X-Tenant-Id is not delivered. Read it out:
import json, sys
env = json.load(sys.stdin)
# Only present if the caller sent ce-x-tenant-id (CloudEvents binary mode).
tenant = env.get("x-tenant-id", "unknown")Then use that value in your own structured logs or in API calls downstream. CLRK does not auto-promote it to span attributes, but your code can.
Emit your own spans
If the agent runtime supports OpenTelemetry, you can emit spans that
chain off the inbound traceparent. Read the envelope's
traceparent attribute, configure your OTel SDK to use it as parent,
and your agent-internal spans will hang off the same trace as the
CLRK-emitted egress spans. Configure your SDK to ship to the same
OTLP endpoint you set on the EgressGateway so everything lands in
one backend.
The metadata service
Each sandbox can reach a per-sandbox HTTP endpoint at
$CLRK_METADATA_URL (and $CLRK_METADATA_URL_V6 for IPv6). Two routes:
GET /v1/event- returns the request envelope. Binary mode by default (CloudEvents attributes asce-*response headers + raw body). Structured mode when you sendAccept: application/cloudevents+json.POST /v1/response- forspec.delivery.mode: Metadata, where the agent posts its reply back here instead of writing stdout.
/v1/event is the right tool when your agent is a long-running
process that needs to fetch its current invocation context without
re-parsing stdin. There is no /v1/identity, /v1/info, or
similar - what's available are exactly those two endpoints today.
What you can't trace today
- Wall-clock cost of sandbox spawn/teardown. The worker doesn't emit OTel spans yet. The latency you see in OTLP starts when the first egress packet hits the gateway. Coming soon.
- Auto-correlated agent-emitted spans. If your agent emits spans
via its own OTel SDK, they correlate via
traceparentpropagation- CLRK does not inject anything automatic on the agent side beyond what arrived on the inbound request.
Where to next
- Configure the OTLP endpoint these spans flow to - see Send telemetry to OTLP endpoints.
- Add identity extraction so
agent.*attributes carry caller identity - see Authenticate users before agents for how to pair an auth proxy with CLRK's identity extractors. - Alert on budget denials and egress denials surfaced through these spans - see Cap LLM spend per agent and Lock down agent egress.