Query fleet metrics

CLRK serves fleet stats - token usage, request rates, latency percentiles, error counts - from a read-only metrics API on the aggregated apiserver. Every value is a query-time aggregation over the same otel_traces / otel_logs your EgressGateway already writes (see Send telemetry to OTLP endpoints for what those spans carry).

The mental model

The API has two halves, both under the metrics.clrk.apoxy.dev group:

A catalog - the metrics resource. Each entry is a named aggregation recipe (gen_ai.tokens, egress.requests, …) with the dimensions it can be grouped by. The catalog is the LIST of this resource, so kubectl get metrics prints it and the console renders its menus from a typed object instead of hardcoded JS.
A query - the series subresource. GET metrics/{id}/series runs the recipe and returns a MetricSeriesSet: one labeled series per group, each carrying one point (a scalar) or one point per time bucket (a range).

It's following a pods + pods/log structure which should be familiar: metrics/{id} is the descriptor (what the metric is, cheap, no datastore hit); metrics/{id}/series is the data (run the query, with parameters).

Separately, the per-agent snapshot resources - taskagentmetrics and daemonagentmetrics - are a flat scalar rollup per agent (the agents page: invocations, errors, token totals, latency). They carry their own series subresource for the same time-series query scoped to one agent.

Browse the catalog

$terminalSH

kubectl get metrics

$terminalTXT

NAME                TYPE        UNIT          SOURCE
gen_ai.tokens       Counter     tokens        traces
gen_ai.duration     Histogram   ms            traces
mcp.calls           Counter     calls         traces
mcp.duration        Histogram   ms            traces
egress.requests     Counter     requests      traces
egress.bytes        Counter     bytes         traces
agent.invocations   Counter     invocations   traces
agent.errors        Counter     errors        traces
budget.denied       Counter     denials       traces
log.severity        Counter     records       logs

A GET on one id returns its descriptor - the value type, the unit, the backing table, and the groupBy dimensions it accepts:

$terminalSH

kubectl get metric gen_ai.tokens -o yaml

The type tells you how to query it:

Counter - a monotonic count or sum (count(), sum(...)).
Histogram - a duration distribution, queried as quantile series.
Gauge - a point-in-time value (reserved; the v1 catalog has none).

Run a query

The query is a GET on the series subresource. The path carries the metric id; the rest is query parameters. kubectl get --raw is the simplest way to call it by hand:

$terminalSH

kubectl get --raw \
"/apis/metrics.clrk.apoxy.dev/v1alpha1/namespaces/default/metrics/egress.requests/series"

$terminalJSON

{
  "kind": "MetricSeriesSet",
  "apiVersion": "metrics.clrk.apoxy.dev/v1alpha1",
  "metric": "egress.requests",
  "type": "Counter",
  "unit": "requests",
  "since": "2026-06-25T06:02:16Z",
  "until": "2026-06-25T07:02:16Z",
  "series": [
    { "points": [ { "timestamp": "2026-06-25T07:02:16Z", "value": "22" } ] }
  ]
}

Scalar vs. time-series

Omit step and you get a scalar: one point per series, summed over the whole window - the cards and counters.

Set step and you get a range: one point per toStartOfInterval bucket - the charts.

$terminalSH

# one point per 5-minute bucket
kubectl get --raw \
"/apis/metrics.clrk.apoxy.dev/v1alpha1/namespaces/default/metrics/egress.requests/series?step=5m"

The response echoes the resolved step and bucket timestamps. An ungrouped query always returns exactly one series, even over an empty window (zero points on a range, a single zero point on a scalar).

Group by a dimension

groupBy splits the result into one series per distinct value of an emitted attribute. Only the dimensions listed in the metric's descriptor are valid:

$terminalSH

# requests split into 2xx / 4xx / 5xx classes
kubectl get --raw "$B/metrics/egress.requests/series?groupBy=http.response.status_class"

Each series carries its group value under a label keyed by the groupBy dimension. A metric that reports more than one value per point (e.g. gen_ai.tokens → input + output) adds a measure label, so the result is one series per (group × measure).

Histograms

A histogram metric returns one series per requested quantile. Omit quantiles for the default p50 / p95 / p99:

$terminalSH

kubectl get --raw "$B/metrics/gen_ai.duration/series?quantiles=0.5,0.99"

Each series is labeled with its quantile; values are whole milliseconds. Combine with step for a per-quantile trend and groupBy for one set of quantiles per group.

The window

since and until are RFC3339 instants bounding a half-open [since, until) window. Both default to the trailing hour ending now, so a bare query still returns something sensible:

$terminalSH

kubectl get --raw "$B/metrics/egress.requests/series?since=2026-06-25T00:00:00Z&until=2026-06-25T12:00:00Z&step=1h"

A future until is clamped to now (there is no data past now); the response's until reflects the clamped value.

Scope

Reads are scoped, and the scope is server-enforced - you cannot widen it past what the path grants.

Fleet (metrics/{id}/series) is scoped to the request namespace. Narrow it within that namespace with scopeKind + scopeName, where scopeKind is TaskAgent, DaemonAgent, or EgressGateway:

$terminalSH
kubectl get --raw "$B/metrics/egress.requests/series?scopeKind=EgressGateway&scopeName=prod-agents&groupBy=http.response.status_class"
Per-agent (taskagentmetrics/{name}/series, daemonagentmetrics/{name}/series) is fixed to the agent the path names. The metric id moves to ?metric=, and scopeKind / scopeName are rejected (the path already fixes the scope):

$terminalSH
kubectl get --raw "$B/taskagentmetrics/my-agent/series?metric=gen_ai.tokens&groupBy=gen_ai.request.model&step=5m"

Values, caps, and truncation

Values are exact. Each point's value is a Kubernetes resource.Quantity serialized as a string, so an integer counter total stays exact regardless of JSON number precision - a token sum above 2⁵³ does not round.
Bounded fan-out. A range query is capped at 1500 buckets (window / step) and a grouped query at the top 50 groups by total value; when more groups exist, the response sets truncated: true. The scanned window is capped at 31 days.

Where to next

Understand the spans these recipes aggregate - see Send telemetry to OTLP endpoints.
Walk a single request across every span it produced - see Trace requests through agents.
The endpoint + schema reference - see HTTP APIs (the CLRK Metrics API section).