# Send telemetry to OTLP endpoints

> Wire CLRK's request logs and traces to your observability backend. Per-EgressGateway configuration, OTLP/HTTP, no extra collector required.

CLRK's observability story is per-`EgressGateway`. Each gateway ships
its own captured request/response records and traces to an OTLP/HTTP
endpoint you configure. There is no global controller-manager OTLP
flag; observability is declarative cluster state, not process state.

## The mental model

Every outbound request from an agent traverses an `EgressGateway`.
The gateway's `ext_proc` captures the L7 transaction (request +
response, with bounded body capture), enriches it with provider-aware
attributes (`gen_ai.*`), and ships one OTLP record per transaction.
That record lands in whatever OTLP/HTTP endpoint you point it at.

Captured request/response pairs always persist to the
controller-manager's embedded ClickHouse, regardless of
`spec.otlp.endpoint`. Setting `endpoint` only adds a best-effort
external re-export on top. Under `clrk dev`, records are also tee'd
to a local in-process receiver that lights up the TUI's `otel-logs` /
`otel-traces` panes.

## Configure the endpoint

`EgressGateway.spec.otlp`:

```yaml
apiVersion: clrk.apoxy.dev/v1alpha1
kind: EgressGateway
metadata:
  name: prod-agents
spec:
  defaultPolicy: deny-all
  listeners:
    - name: tls-out
      protocol: TLS
      tls:
        mode: Terminate
  otlp:
    endpoint: "https://api.honeycomb.io"
    headers:
      x-honeycomb-team: "${HONEYCOMB_API_KEY}"
    captureBody:
      maxBytes: 65536
```

Two important details:

- **`endpoint` is a base URL.** CLRK appends `/v1/traces` and
  `/v1/logs` itself. Don't include the path.
- **OTLP/HTTP only today.** No gRPC option. If your collector is
  gRPC-only, front it with `otelcol-contrib` doing HTTP-in / gRPC-out.

## What you'll see in spans

One span per L7 transaction through the gateway. Attributes worth
querying on:

| Namespace | Attribute | Meaning |
|---|---|---|
| `gen_ai.*` | `system` | `anthropic`, `openai`, `google_genai` |
| | `operation.name` | `chat`, `text_completion`, `embeddings` |
| | `request.model` / `response.model` | model the request asked for vs. what answered |
| | `usage.input_tokens` / `usage.output_tokens` | token counts from the provider response |
| | `response.stream` | true if SSE / streaming |
| `http.*` | `request.method`, `response.status_code` | standard HTTP semantics |
| `clrk.*` | `component` | always `egress-extproc` for these spans (the ingress dispatch path uses `ingress-extproc`, worker spans use `worker`) |
| | `aiproviderroute.name` / `.namespace` / `.matched` | which APR rule matched |
| | `dst.name` | DNS-bound hostname for the connection |
| | `req.bytes` / `resp.bytes` / `req.truncated` / `resp.truncated` | size and truncation flags |
| | `body.bytes` / `body.b64` / `body.truncated` / `body.usage_visible` / `body.request_rewritten` | body capture metadata |
| | `duration_ms` | end-to-end transaction time |
| | `budget.denied` / `budget.daily_used` / `budget.daily_max` | budget enforcement signal |
| | `l4.bytes_upstream` | byte counter on the L4 leg (TCP routes) |
| (agent) | `agent.kind`, `agent.namespace`, `agent.name`, `agent.uid`, `agent.revision` | which agent originated the traffic |
| | `invocation.id` | stable per-invocation ID - join key across spans |

`invocation.id` is the load-bearing one for debugging: filter on it
to walk the full chain of egress traffic produced by a single
inbound request. See [Trace requests through
agents](./trace-requests-through-agents) for a worked debugging
session.

## What `clrk dev` does differently

The `clrk dev` host CLI runs its own OTLP/HTTP receiver in-process
(port 14318). Under `clrk dev`, the controller-manager is launched
with `--dev-otlp-fallback-endpoint` pointing at that receiver. The
cm's OTLP receiver then tees every captured signal to the TUI
receiver in addition to its embedded ClickHouse and any per-EG
`spec.otlp.endpoint` forwarder - it does not override the per-EG
endpoint. That's why the example manifests can ship with
`otlp.endpoint` empty and still surface in the TUI's `otel-logs` /
`otel-traces` panes.

In production, no dev tee applies. Captured request/response pairs
always persist to the controller-manager's embedded ClickHouse; the
endpoint you set on the `EgressGateway` adds a best-effort external
re-export on top of that.

## Wire it to Honeycomb

```yaml
otlp:
  endpoint: "https://api.honeycomb.io"
  headers:
    x-honeycomb-team: "${HONEYCOMB_API_KEY}"
```

Honeycomb's OTLP/HTTP ingest accepts both traces and logs at the
auto-appended `/v1/traces` and `/v1/logs` paths. Surface
`HONEYCOMB_API_KEY` via your usual secret-templating path before
applying.

## Wire it to Grafana Tempo

```yaml
otlp:
  endpoint: "https://tempo-prod-04-prod-us-east-0.grafana.net"
  headers:
    Authorization: "Basic ${TEMPO_BASIC_AUTH}"
```

For Grafana Cloud, `TEMPO_BASIC_AUTH` is `base64(<instance-id>:<api-key>)`.

## Wire it to a self-hosted collector

If you already run `otelcol-contrib`, point CLRK at it and let the
collector fan out to your destinations:

```yaml
otlp:
  endpoint: "http://otelcol.observability.svc.cluster.local:4318"
```

The collector config side:

```yaml
receivers:
  otlp:
    protocols:
      http: {}
exporters:
  otlphttp/honeycomb:
    endpoint: "https://api.honeycomb.io"
    headers:
      x-honeycomb-team: "${HONEYCOMB_API_KEY}"
service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlphttp/honeycomb]
    logs:
      receivers: [otlp]
      exporters: [otlphttp/honeycomb]
```

## What's not in OTLP today

Honest gaps so you don't go looking:

- **No worker-side spans.** Sandbox spawn, teardown, and libcontainer
  events are slog-only. Coming soon.
- **No metrics.** Only traces + logs ship over OTLP today. Counters,
  gauges, and histograms are coming soon. If you need them now,
  talk to us.
- **No sampling control.** All spans ship; no head- or tail-sampling
  knob in the spec. Use your collector for sampling if you need it.
- **Bound body capture.** `otlp.captureBody.maxBytes` defaults to 64
  KiB per direction; bodies over that are truncated and marked
  `clrk.body.truncated=true`. Body capture only fires for
  `application/json`, `application/x-ndjson`, and `text/event-stream`
  by default; override `includeContentTypes` if you need others.

## Where to next

- Walk a single request from inbound HTTP through every OTLP span it
  generates - see [Trace requests through
  agents](./trace-requests-through-agents).
- Alert on budget denials - see [Cap LLM spend per
  agent](./cap-llm-spend-per-agent).
- Alert on egress denials (or the absence of records you expected) - 
  see [Lock down agent egress](./lock-down-agent-egress).
