Rotating the kube-controller certificate

The kube-controller pod authenticates to the Apoxy control plane with a per-cluster client certificate stored in Secret apoxy/apiz-cert. Certificates are valid for 365 days. By default the controller auto-renews them when validity drops below 30 days - see Auto-rotation below. This guide walks through the manual rotate flow for cases where you want operator-driven rolls (compromise response, compliance audits, forced re-issuance).

For background on how the cert is issued and what it identifies, see How the controller authenticates to Apoxy.

When to rotate manually

The private key may have been exposed (cluster compromise, accidental Secret export, lost backup tape).
You're rotating on a routine schedule for compliance and want the rotation event tied to a human action.
Auto-renewal is failing - apoxy_kube_controller_cert_renewals_total{result="failure"} is non-zero (see kube-controller metrics).

If you suspect compromise, rotate first, then revoke the old fingerprint - in that order. Revoking before the new cert is in place would drop the controller until rotation completes.

Prerequisites

The Apoxy CLI installed and authenticated (apoxy auth).
kubectl access to the target cluster.
The cluster's controller installed (apoxy k8s install).

Inspect the current cert

$terminalSH

apoxy k8s certs list --context <kube-context>

Example output:

$terminalTXT

Cluster Secret apoxy/apiz-cert:
  Fingerprint: a5670b32a930fc4a6cbf0f9a0bfa29c92e510184
  Expires:     2026-05-13T18:33:19Z (in 364d23h)
  Status:      active

Pass --all to also list every cert the Apoxy API has on file for the project (including revoked):

$terminalSH

apoxy k8s certs list --context <kube-context> --all

The cert currently mounted in the Secret is marked with a *.

Rotate

$terminalSH

apoxy k8s certs rotate \
  --context <kube-context> \
  --yes

What happens:

The CLI issues a new cert from the Apoxy control plane. The old cert remains valid.
The CLI writes the new cert material into Secret apoxy/apiz-cert, replacing tls.crt, tls.key, and ca.crt.
The CLI patches the kube-controller Deployment's pod template with annotations apoxy.dev/cert-rotated-at and apoxy.dev/cert-fingerprint, triggering a rolling restart.
With the controller's default single-replica RollingUpdate strategy, Kubernetes brings the new pod up Ready before terminating the old one. The aggregated APIService Service is configured with publishNotReadyAddresses: true, so the API stays routable through the swap.
The CLI prints both fingerprints and the follow-up revoke command.

The old pod continues to proxy with the old cert until it terminates; the Apoxy control plane accepts both certs because revoke is deferred to after the rollout completes.

Rotate without restarting the pod

$terminalSH

apoxy k8s certs rotate \
  --context <kube-context> \
  --no-restart \
  --yes

With --no-restart the CLI writes the new Secret but does NOT patch the pod template. The running kube-controller pod hot-reloads the cert via an fsnotify watch on the projected Secret directory (/etc/apoxy/certs). Kubelet projects updated Secret contents onto the mount within ~60s; the controller picks them up, validates the new cert/key/CA, and atomically swaps the upstream HTTP transport in place.

The CLI polls the pod's /metrics endpoint for apoxy_kube_controller_cert_expiry_seconds to confirm the running process has the new cert (see kube-controller metrics for scraping and alerting). The hot-reload wait is bounded by --reload-wait (default 3min); on timeout the CLI prints how to verify manually with apoxy k8s certs list.

--no-restart is the preferred mode for routine rotations once your install is on a controller image that supports hot-reload (everything after the apoxy CLI v0.20 / cosmos onboarding bump that ships this guide). The classic restart path is kept for compatibility and for cases where you specifically want a fresh pod.

Rotate against unusual topologies

By default rotate refuses to operate on multi-replica or Recreate-strategy Deployments - the zero-downtime guarantee assumes single-replica RollingUpdate. If you've verified your topology can tolerate the rollout, pass --allow-disruption.

Auto-rotation

Once installed, the controller renews its own cert on a slow tick. The defaults are tuned so a healthy cluster never has an expired cert without anyone touching it:

Tick interval: 1 hour. The renewer wakes up, reads the live cert, and checks remaining validity.
Threshold: 30 days. Below this, the controller calls cosmos's IssueServiceCert endpoint over mTLS with its current cert and writes the new material into Secret apoxy/apiz-cert. The fsnotify watcher then hot-reloads it in place (no pod restart).
Singleton: the renewer runs under a Kubernetes leader-election lease, so a future multi-replica install issues against cosmos from exactly one pod per cycle.

What you'll see when auto-rotation runs:

apoxy_kube_controller_cert_renewals_total{result="success"} increments by 1.
apoxy_kube_controller_cert_expiry_seconds jumps to the new NotAfter.
A Normal CertRenewed Event lands on the kube-controller Deployment - visible via kubectl describe deploy kube-controller -n apoxy.
The previous cert stays valid in cosmos. Revocation of the displaced cert is operator-driven (apoxy k8s certs revoke) - auto-renewal deliberately doesn't revoke, so a half-completed rotation doesn't lock the controller out of cosmos.

Disabling auto-rotation

Set APOXY_CERT_RENEW_INTERVAL=-1s on the kube-controller container if you need rotation to be operator-driven only (compliance audit windows, change-control freezes). The renewer logs Cert auto-renewal disabled by configuration once and exits cleanly; manual apoxy k8s certs rotate still works.

Monitoring auto-rotation

Wire the apoxy_kube_controller_cert_renewals_total{result="failure"} counter into your alerts. A non-zero rate means the controller can't reach cosmos or cosmos is rejecting the renewal. The live cert keeps working until expiry, so this is a warning, not a page - but you have until NotAfter to fix it. See the kube-controller metrics reference for recommended alerts.

Verify the rotation

After rotate exits successfully:

$terminalSH

# 1. Confirm the Secret now holds the new cert.
apoxy k8s certs list --context <kube-context>

# 2. Confirm the controller pod is Ready and using the new cert.
kubectl --context <kube-context> -n apoxy get pods -l app=kube-controller
kubectl --context <kube-context> -n apoxy get deploy kube-controller \
  -o jsonpath='{.spec.template.metadata.annotations.apoxy\.dev/cert-fingerprint}{"\n"}'

# 3. Confirm the aggregated API is healthy on a real request.
kubectl --context <kube-context> get gateways -A

The cert-fingerprint annotation on the pod template should match the new fingerprint printed by rotate. If the controller fails to start with the new cert, the rolling update halts and the old pod keeps serving - investigate before retrying.

Revoke the old cert

rotate deliberately leaves the old cert valid so you can fall back to it if anything looks wrong. Once you've confirmed the rotation took, revoke:

$terminalSH

apoxy k8s certs revoke <OLD-FINGERPRINT> --user-jwt <jwt>

Or run rotate with --revoke to revoke as part of the same flow once the rollout completes:

$terminalSH

apoxy k8s certs rotate \
  --context <kube-context> \
  --revoke \
  --user-jwt <jwt> \
  --yes

Why revoke needs a user JWT

The Apoxy API rejects API-key auth on revoke. The user JWT comes from your dashboard session - log into dashboard.apoxy.dev, open the developer tools, and copy the bearer token Apoxy sends on its API calls (or use the dashboard's UI to expose it). Pass it via --user-jwt, set APOXY_USER_JWT, or write it to ~/.config/apoxy/user-jwt.

This is intentional: the API key sits in the Apoxy CLI config of every operator who can install the controller, and may also be present elsewhere in your infra. If a leaked API key could revoke the cert it was just used to issue, an attacker could trivially lock the controller out of the control plane. User auth ensures revoke happens with a human-tied credential.

Revocation propagates to the Apoxy ext_authz layer within ~30 seconds, after which any request presenting the revoked cert fails with 403.

Troubleshooting

`Secret apoxy/apiz-cert changed mid-rotation`

Another rotate started while this one was running. Wait for it to finish, then re-check the state with apoxy k8s certs list. The Secret update is gated on ResourceVersion to fail fast rather than silently overwrite.

`Deployment kube-controller not found`

The controller isn't installed in this cluster. Run apoxy k8s install first.

`namespace apoxy is missing annotation apoxy.dev/cluster-name`

The namespace was created some other way (or annotations were stripped). Either re-run apoxy k8s install --cluster-name <name> or set the annotation by hand:

$terminalSH

kubectl annotate ns apoxy apoxy.dev/cluster-name=<name>

Pod fails to become Ready after rotation

The most common cause is that the new cert was issued for a different project than the one the controller expected. Check:

The project the cluster was installed against - embedded in the in-cluster ConfigMap:

$terminalSH
kubectl --context <kube-context> -n apoxy get cm kube-controller \ -o jsonpath='{.data.config\.yaml}' | grep currentProject
The project your current CLI session is using - currentProject in ~/.apoxy/config.yaml. If these don't match, re-select the right project in the CLI before running rotate again.
The pod's logs (kubectl -n apoxy logs deploy/kube-controller) - they show the cert load and the upstream connection attempt.

If the old pod is still healthy, you can roll back by deleting the new Secret content and waiting for the old pod's cached cert to expire - but in practice it's faster to fix the project mismatch and re-rotate.

Revoke fails with `401`/`403`

Your user JWT is missing, expired, or doesn't carry the project membership. Re-fetch it from the dashboard. API keys are intentionally rejected on this endpoint; passing one will hit this error.

When to rotate manually

Prerequisites

Inspect the current cert

Rotate

Rotate without restarting the pod

Rotate against unusual topologies

Auto-rotation

Disabling auto-rotation

Monitoring auto-rotation

Verify the rotation

Revoke the old cert

Why revoke needs a user JWT

Troubleshooting

Secret apoxy/apiz-cert changed mid-rotation

Deployment kube-controller not found

namespace apoxy is missing annotation apoxy.dev/cluster-name

Pod fails to become Ready after rotation

Revoke fails with 401/403

`Secret apoxy/apiz-cert changed mid-rotation`

`Deployment kube-controller not found`

`namespace apoxy is missing annotation apoxy.dev/cluster-name`

Revoke fails with `401`/`403`