# Trace requests through agents

> Follow a single inbound request from ingress through sandbox through every LLM call it produced. Worked debugging session.

It is 03:14. A customer reports a wrong answer from one of your
agents. You have an OTLP backend, an agent name, and a rough
timestamp. This guide walks you from there to "here is exactly what
we sent, what came back, and who triggered it" - in about a minute.

## The shape of a single invocation

```mermaid
sequenceDiagram
  participant C as Client
  participant I as Ingress
  participant S as Sandbox
  participant EG as EgressGateway
  participant U as Upstream
  C->>I: POST
  I->>S: dispatch
  Note over S: agent runs
  S->>EG: HTTPS
  Note over EG: PROXY v2 TLV: invocation.id
  EG->>U: forwarded
  U-->>EG: response
  EG-->>S: response
  S-->>I: stdout
  I-->>C: HTTP response
```

Every egress span carries an `invocation.id` attribute. It's the same
ID across every span produced by one inbound request, including
multiple LLM/MCP calls and any internal HTTP the agent makes through
the gateway. That ID is your join key.

## Attribution: how the chain stays connected

The worker stamps `invocation.id` (and a few other attribution
fields) into a PROXY v2 TLV on every outbound dial. The
`EgressGateway`'s ext_proc reads the TLV and sets the corresponding
OTLP attributes on each captured transaction. The agent process
itself doesn't have to pass any header - even if it did, the
worker-side TLV is the authority.

CLRK also propagates `traceparent` and `tracestate` automatically: the
inbound trace context on the request received by the ingress is
injected on every outbound LLM/MCP call. If the original caller was
inside an existing trace, your agent's calls extend that trace.

Caller-supplied headers are **not** auto-promoted to span attributes.
A bare `X-Tenant-Id` HTTP header is not delivered to the agent at all;
caller context reaches the agent only when the caller sends it in
CloudEvents binary form as `ce-<name>` (e.g. `ce-x-tenant-id`), which
shows up in the envelope under the de-prefixed key (`x-tenant-id`). If
you want such values on the OTLP spans, the agent has to emit its own
span or write them into the response in a way you can grep on.

## Worked debugging session

> "Customer X reported a wrong answer from `jq-bot` around 03:14 UTC."

**Step 1: scope the time window.**

Open your OTLP backend (Honeycomb, Tempo, or your collector's UI).
Filter on:

```
agent.name = "jq-bot"
gen_ai.system = "anthropic"
time between 03:13 and 03:16
```

**Step 2: find the offending span.**

Look at the spans returned. Two attributes will identify
"interesting" calls quickly:

- `gen_ai.response.model` - is it the model you expected, or did
  the request silently fall back?
- `gen_ai.usage.input_tokens` - was the prompt huge? (A bug in
  context assembly often shows up as a 50KB prompt.)
- `clrk.budget.denied` - was this call refused by the budget? See
  [Cap LLM spend per agent](./cap-llm-spend-per-agent).

Pick the call that matches the customer's complaint window.

**Step 3: pull `invocation.id`.**

Open the span detail. Copy `invocation.id`. This is the join key.

**Step 4: see the full chain.**

Re-query the OTLP backend:

```
invocation.id = "<the-id>"
```

You now see every egress call the same invocation produced - multiple
LLM calls if the agent re-prompted, MCP calls if any tool runs, and
internal HTTP calls if the agent talked to your own services through
the gateway.

**Step 5: walk back to the trigger.**

The inbound HTTP request to the per-TaskAgent ingress is also part
of the trace - `traceparent` propagation tied them together. Filter
on `agent.name` plus the timestamp range and look for the request
span; that's where any caller-supplied headers (like a customer ID
your auth proxy injected) live.

## Two ways to attach your own context

### Inside the agent

Caller context reaches the agent only when the caller sends it as a
CloudEvents binary-mode header `ce-<name>` (for example
`-H 'ce-x-tenant-id: acme'`, or via `clrk agents run-task --header`).
In the structured-mode JSON envelope on stdin it appears under the
de-prefixed key (`x-tenant-id`); in binary mode (`GET /v1/event`) it
comes back as the `ce-x-tenant-id` response header. A bare,
non-`ce-*` header like `X-Tenant-Id` is not delivered. Read it out:

```python
import json, sys
env = json.load(sys.stdin)
# Only present if the caller sent ce-x-tenant-id (CloudEvents binary mode).
tenant = env.get("x-tenant-id", "unknown")
```

Then use that value in your own structured logs or in API calls
downstream. CLRK does not auto-promote it to span attributes, but
your code can.

### Emit your own spans

If the agent runtime supports OpenTelemetry, you can emit spans that
chain off the inbound `traceparent`. Read the envelope's
`traceparent` attribute, configure your OTel SDK to use it as parent,
and your agent-internal spans will hang off the same trace as the
CLRK-emitted egress spans. Configure your SDK to ship to the same
OTLP endpoint you set on the `EgressGateway` so everything lands in
one backend.

## The metadata service

Each sandbox can reach a per-sandbox HTTP endpoint at
`$CLRK_METADATA_URL` (and `$CLRK_METADATA_URL_V6` for IPv6). Two routes:

- `GET /v1/event` - returns the request envelope. Binary mode by
  default (CloudEvents attributes as `ce-*` response headers + raw
  body). Structured mode when you send `Accept: application/cloudevents+json`.
- `POST /v1/response` - for `spec.delivery.mode: Metadata`, where the
  agent posts its reply back here instead of writing stdout.

`/v1/event` is the right tool when your agent is a long-running
process that needs to fetch its current invocation context without
re-parsing stdin. There is no `/v1/identity`, `/v1/info`, or
similar - what's available are exactly those two endpoints today.

## What you can't trace today

- **Wall-clock cost of sandbox spawn/teardown.** The worker doesn't
  emit OTel spans yet. The latency you see in OTLP starts when the
  first egress packet hits the gateway. Coming soon.
- **Auto-correlated agent-emitted spans.** If your agent emits spans
  via its own OTel SDK, they correlate via `traceparent` propagation
  - CLRK does not inject anything automatic on the agent side
  beyond what arrived on the inbound request.

## Where to next

- Configure the OTLP endpoint these spans flow to - see [Send
  telemetry to OTLP endpoints](./send-telemetry-to-otlp).
- Add identity extraction so `agent.*` attributes carry caller
  identity - see [Authenticate users before
  agents](./authenticate-users-before-agents) for how to pair an
  auth proxy with CLRK's identity extractors.
- Alert on budget denials and egress denials surfaced through these
  spans - see [Cap LLM spend per agent](./cap-llm-spend-per-agent)
  and [Lock down agent egress](./lock-down-agent-egress).
