Query fleet metrics
Read fleet stats and chart data from CLRK's metrics API: a typed catalog plus a time-series query over the spans and logs your agents already emit.
CLRK serves fleet stats - token usage, request rates, latency
percentiles, error counts - from a read-only metrics API on the
aggregated apiserver. Every value is a query-time aggregation over the same
otel_traces / otel_logs your EgressGateway already writes (see
Send telemetry to OTLP endpoints
for what those spans carry).
The mental model
The API has two halves, both under the metrics.clrk.apoxy.dev group:
- A catalog - the
metricsresource. Each entry is a named aggregation recipe (gen_ai.tokens,egress.requests, …) with the dimensions it can be grouped by. The catalog is the LIST of this resource, sokubectl get metricsprints it and the console renders its menus from a typed object instead of hardcoded JS. - A query - the
seriessubresource.GET metrics/{id}/seriesruns the recipe and returns aMetricSeriesSet: one labeled series per group, each carrying one point (a scalar) or one point per time bucket (a range).
It's following a pods + pods/log structure which should be familiar:
metrics/{id} is the descriptor (what the metric is, cheap, no datastore
hit); metrics/{id}/series is the data (run the query, with parameters).
Separately, the per-agent snapshot resources - taskagentmetrics
and daemonagentmetrics - are a flat scalar rollup per agent (the
agents page: invocations, errors, token totals, latency). They carry
their own series subresource for the same time-series query scoped to
one agent.
Browse the catalog
kubectl get metricsNAME TYPE UNIT SOURCE
gen_ai.tokens Counter tokens traces
gen_ai.duration Histogram ms traces
mcp.calls Counter calls traces
mcp.duration Histogram ms traces
egress.requests Counter requests traces
egress.bytes Counter bytes traces
agent.invocations Counter invocations traces
agent.errors Counter errors traces
budget.denied Counter denials traces
log.severity Counter records logsA GET on one id returns its descriptor - the value type, the unit,
the backing table, and the groupBy dimensions it accepts:
kubectl get metric gen_ai.tokens -o yamlThe type tells you how to query it:
- Counter - a monotonic count or sum (
count(),sum(...)). - Histogram - a duration distribution, queried as quantile series.
- Gauge - a point-in-time value (reserved; the v1 catalog has none).
Run a query
The query is a GET on the series subresource. The path carries the
metric id; the rest is query parameters. kubectl get --raw is the
simplest way to call it by hand:
kubectl get --raw \
"/apis/metrics.clrk.apoxy.dev/v1alpha1/namespaces/default/metrics/egress.requests/series"{
"kind": "MetricSeriesSet",
"apiVersion": "metrics.clrk.apoxy.dev/v1alpha1",
"metric": "egress.requests",
"type": "Counter",
"unit": "requests",
"since": "2026-06-25T06:02:16Z",
"until": "2026-06-25T07:02:16Z",
"series": [
{ "points": [ { "timestamp": "2026-06-25T07:02:16Z", "value": "22" } ] }
]
}Scalar vs. time-series
Omit step and you get a scalar: one point per series, summed over
the whole window - the cards and counters.
Set step and you get a range: one point per toStartOfInterval
bucket - the charts.
# one point per 5-minute bucket
kubectl get --raw \
"/apis/metrics.clrk.apoxy.dev/v1alpha1/namespaces/default/metrics/egress.requests/series?step=5m"The response echoes the resolved step and bucket timestamps. An
ungrouped query always returns exactly one series, even over an empty
window (zero points on a range, a single zero point on a scalar).
Group by a dimension
groupBy splits the result into one series per distinct value of an
emitted attribute. Only the dimensions listed in the metric's
descriptor are valid:
# requests split into 2xx / 4xx / 5xx classes
kubectl get --raw "$B/metrics/egress.requests/series?groupBy=http.response.status_class"Each series carries its group value under a label keyed by the
groupBy dimension. A metric that reports more than one value per point
(e.g. gen_ai.tokens → input + output) adds a measure label, so the
result is one series per (group × measure).
Histograms
A histogram metric returns one series per requested quantile. Omit
quantiles for the default p50 / p95 / p99:
kubectl get --raw "$B/metrics/gen_ai.duration/series?quantiles=0.5,0.99"Each series is labeled with its quantile; values are whole
milliseconds. Combine with step for a per-quantile trend and groupBy
for one set of quantiles per group.
The window
since and until are RFC3339 instants bounding a half-open
[since, until) window. Both default to the trailing hour ending now,
so a bare query still returns something sensible:
kubectl get --raw "$B/metrics/egress.requests/series?since=2026-06-25T00:00:00Z&until=2026-06-25T12:00:00Z&step=1h"A future until is clamped to now (there is no data past now); the
response's until reflects the clamped value.
Scope
Reads are scoped, and the scope is server-enforced - you cannot widen it past what the path grants.
-
Fleet (
metrics/{id}/series) is scoped to the request namespace. Narrow it within that namespace withscopeKind+scopeName, wherescopeKindisTaskAgent,DaemonAgent, orEgressGateway:$terminalSHkubectl get --raw "$B/metrics/egress.requests/series?scopeKind=EgressGateway&scopeName=prod-agents&groupBy=http.response.status_class" -
Per-agent (
taskagentmetrics/{name}/series,daemonagentmetrics/{name}/series) is fixed to the agent the path names. The metric id moves to?metric=, andscopeKind/scopeNameare rejected (the path already fixes the scope):$terminalSHkubectl get --raw "$B/taskagentmetrics/my-agent/series?metric=gen_ai.tokens&groupBy=gen_ai.request.model&step=5m"
Values, caps, and truncation
- Values are exact. Each point's
valueis a Kubernetesresource.Quantityserialized as a string, so an integer counter total stays exact regardless of JSON number precision - a token sum above 2⁵³ does not round. - Bounded fan-out. A range query is capped at 1500 buckets
(
window / step) and a grouped query at the top 50 groups by total value; when more groups exist, the response setstruncated: true. The scanned window is capped at 31 days.
Where to next
- Understand the spans these recipes aggregate - see Send telemetry to OTLP endpoints.
- Walk a single request across every span it produced - see Trace requests through agents.
- The endpoint + schema reference - see HTTP APIs (the CLRK Metrics API section).