Getting startedGuidesReferenceChangelog
Apoxy:// Docs / Guides / Query fleet metrics

Query fleet metrics

Read fleet stats and chart data from CLRK's metrics API: a typed catalog plus a time-series query over the spans and logs your agents already emit.

CLRK serves fleet stats - token usage, request rates, latency percentiles, error counts - from a read-only metrics API on the aggregated apiserver. Every value is a query-time aggregation over the same otel_traces / otel_logs your EgressGateway already writes (see Send telemetry to OTLP endpoints for what those spans carry).

The mental model

The API has two halves, both under the metrics.clrk.apoxy.dev group:

  • A catalog - the metrics resource. Each entry is a named aggregation recipe (gen_ai.tokens, egress.requests, …) with the dimensions it can be grouped by. The catalog is the LIST of this resource, so kubectl get metrics prints it and the console renders its menus from a typed object instead of hardcoded JS.
  • A query - the series subresource. GET metrics/{id}/series runs the recipe and returns a MetricSeriesSet: one labeled series per group, each carrying one point (a scalar) or one point per time bucket (a range).

It's following a pods + pods/log structure which should be familiar: metrics/{id} is the descriptor (what the metric is, cheap, no datastore hit); metrics/{id}/series is the data (run the query, with parameters).

Separately, the per-agent snapshot resources - taskagentmetrics and daemonagentmetrics - are a flat scalar rollup per agent (the agents page: invocations, errors, token totals, latency). They carry their own series subresource for the same time-series query scoped to one agent.

Browse the catalog

$terminalSH
kubectl get metrics
$terminalTXT
NAME TYPE UNIT SOURCE gen_ai.tokens Counter tokens traces gen_ai.duration Histogram ms traces mcp.calls Counter calls traces mcp.duration Histogram ms traces egress.requests Counter requests traces egress.bytes Counter bytes traces agent.invocations Counter invocations traces agent.errors Counter errors traces budget.denied Counter denials traces log.severity Counter records logs

A GET on one id returns its descriptor - the value type, the unit, the backing table, and the groupBy dimensions it accepts:

$terminalSH
kubectl get metric gen_ai.tokens -o yaml

The type tells you how to query it:

  • Counter - a monotonic count or sum (count(), sum(...)).
  • Histogram - a duration distribution, queried as quantile series.
  • Gauge - a point-in-time value (reserved; the v1 catalog has none).

Run a query

The query is a GET on the series subresource. The path carries the metric id; the rest is query parameters. kubectl get --raw is the simplest way to call it by hand:

$terminalSH
kubectl get --raw \ "/apis/metrics.clrk.apoxy.dev/v1alpha1/namespaces/default/metrics/egress.requests/series"
$terminalJSON
{ "kind": "MetricSeriesSet", "apiVersion": "metrics.clrk.apoxy.dev/v1alpha1", "metric": "egress.requests", "type": "Counter", "unit": "requests", "since": "2026-06-25T06:02:16Z", "until": "2026-06-25T07:02:16Z", "series": [ { "points": [ { "timestamp": "2026-06-25T07:02:16Z", "value": "22" } ] } ] }

Scalar vs. time-series

Omit step and you get a scalar: one point per series, summed over the whole window - the cards and counters.

Set step and you get a range: one point per toStartOfInterval bucket - the charts.

$terminalSH
# one point per 5-minute bucket kubectl get --raw \ "/apis/metrics.clrk.apoxy.dev/v1alpha1/namespaces/default/metrics/egress.requests/series?step=5m"

The response echoes the resolved step and bucket timestamps. An ungrouped query always returns exactly one series, even over an empty window (zero points on a range, a single zero point on a scalar).

Group by a dimension

groupBy splits the result into one series per distinct value of an emitted attribute. Only the dimensions listed in the metric's descriptor are valid:

$terminalSH
# requests split into 2xx / 4xx / 5xx classes kubectl get --raw "$B/metrics/egress.requests/series?groupBy=http.response.status_class"

Each series carries its group value under a label keyed by the groupBy dimension. A metric that reports more than one value per point (e.g. gen_ai.tokens → input + output) adds a measure label, so the result is one series per (group × measure).

Histograms

A histogram metric returns one series per requested quantile. Omit quantiles for the default p50 / p95 / p99:

$terminalSH
kubectl get --raw "$B/metrics/gen_ai.duration/series?quantiles=0.5,0.99"

Each series is labeled with its quantile; values are whole milliseconds. Combine with step for a per-quantile trend and groupBy for one set of quantiles per group.

The window

since and until are RFC3339 instants bounding a half-open [since, until) window. Both default to the trailing hour ending now, so a bare query still returns something sensible:

$terminalSH
kubectl get --raw "$B/metrics/egress.requests/series?since=2026-06-25T00:00:00Z&until=2026-06-25T12:00:00Z&step=1h"

A future until is clamped to now (there is no data past now); the response's until reflects the clamped value.

Scope

Reads are scoped, and the scope is server-enforced - you cannot widen it past what the path grants.

  • Fleet (metrics/{id}/series) is scoped to the request namespace. Narrow it within that namespace with scopeKind + scopeName, where scopeKind is TaskAgent, DaemonAgent, or EgressGateway:

    $terminalSH
    kubectl get --raw "$B/metrics/egress.requests/series?scopeKind=EgressGateway&scopeName=prod-agents&groupBy=http.response.status_class"
  • Per-agent (taskagentmetrics/{name}/series, daemonagentmetrics/{name}/series) is fixed to the agent the path names. The metric id moves to ?metric=, and scopeKind / scopeName are rejected (the path already fixes the scope):

    $terminalSH
    kubectl get --raw "$B/taskagentmetrics/my-agent/series?metric=gen_ai.tokens&groupBy=gen_ai.request.model&step=5m"

Values, caps, and truncation

  • Values are exact. Each point's value is a Kubernetes resource.Quantity serialized as a string, so an integer counter total stays exact regardless of JSON number precision - a token sum above 2⁵³ does not round.
  • Bounded fan-out. A range query is capped at 1500 buckets (window / step) and a grouped query at the top 50 groups by total value; when more groups exist, the response sets truncated: true. The scanned window is capped at 31 days.

Where to next