Rotating the kube-controller certificate
Roll the per-cluster client certificate the Apoxy controller uses, without dropping the aggregated API.
The kube-controller pod authenticates to the Apoxy control plane with a per-cluster client certificate stored in Secret apoxy/apiz-cert. Certificates are valid for 365 days. By default the controller auto-renews them when validity drops below 30 days — see Auto-rotation below. This guide walks through the manual rotate flow for cases where you want operator-driven rolls (compromise response, compliance audits, forced re-issuance).
For background on how the cert is issued and what it identifies, see How the controller authenticates to Apoxy.
When to rotate manually
- The private key may have been exposed (cluster compromise, accidental Secret export, lost backup tape).
- You're rotating on a routine schedule for compliance and want the rotation event tied to a human action.
- Auto-renewal is failing —
apoxy_kube_controller_cert_renewals_total{result="failure"}is non-zero (see kube-controller metrics).
If you suspect compromise, rotate first, then revoke the old fingerprint — in that order. Revoking before the new cert is in place would drop the controller until rotation completes.
Prerequisites
- The Apoxy CLI installed and authenticated (
apoxy auth). kubectlaccess to the target cluster.- The cluster's controller installed (
apoxy k8s install).
Inspect the current cert
apoxy k8s certs list --context <kube-context>Example output:
Cluster Secret apoxy/apiz-cert:
Fingerprint: a5670b32a930fc4a6cbf0f9a0bfa29c92e510184
Expires: 2026-05-13T18:33:19Z (in 364d23h)
Status: activePass --all to also list every cert the Apoxy API has on file for the project (including revoked):
apoxy k8s certs list --context <kube-context> --allThe cert currently mounted in the Secret is marked with a *.
Rotate
apoxy k8s certs rotate \
--context <kube-context> \
--yesWhat happens:
- The CLI issues a new cert from the Apoxy control plane. The old cert remains valid.
- The CLI writes the new cert material into Secret
apoxy/apiz-cert, replacingtls.crt,tls.key, andca.crt. - The CLI patches the
kube-controllerDeployment's pod template with annotationsapoxy.dev/cert-rotated-atandapoxy.dev/cert-fingerprint, triggering a rolling restart. - With the controller's default single-replica
RollingUpdatestrategy, Kubernetes brings the new pod upReadybefore terminating the old one. The aggregated APIService Service is configured withpublishNotReadyAddresses: true, so the API stays routable through the swap. - The CLI prints both fingerprints and the follow-up
revokecommand.
The old pod continues to proxy with the old cert until it terminates; the Apoxy control plane accepts both certs because revoke is deferred to after the rollout completes.
Rotate without restarting the pod
apoxy k8s certs rotate \
--context <kube-context> \
--no-restart \
--yesWith --no-restart the CLI writes the new Secret but does NOT patch the pod template. The running kube-controller pod hot-reloads the cert via an fsnotify watch on the projected Secret directory (/etc/apoxy/certs). Kubelet projects updated Secret contents onto the mount within ~60s; the controller picks them up, validates the new cert/key/CA, and atomically swaps the upstream HTTP transport in place.
The CLI polls the pod's /metrics endpoint for apoxy_kube_controller_cert_expiry_seconds to confirm the running process has the new cert (see kube-controller metrics for scraping and alerting). The hot-reload wait is bounded by --reload-wait (default 3min); on timeout the CLI prints how to verify manually with apoxy k8s certs list.
--no-restart is the preferred mode for routine rotations once your install is on a controller image that supports hot-reload (everything after the apoxy CLI v0.20 / cosmos onboarding bump that ships this guide). The classic restart path is kept for compatibility and for cases where you specifically want a fresh pod.
Rotate against unusual topologies
By default rotate refuses to operate on multi-replica or Recreate-strategy Deployments — the zero-downtime guarantee assumes single-replica RollingUpdate. If you've verified your topology can tolerate the rollout, pass --allow-disruption.
Auto-rotation
Once installed, the controller renews its own cert on a slow tick. The defaults are tuned so a healthy cluster never has an expired cert without anyone touching it:
- Tick interval: 1 hour. The renewer wakes up, reads the live cert, and checks remaining validity.
- Threshold: 30 days. Below this, the controller calls cosmos's
IssueServiceCertendpoint over mTLS with its current cert and writes the new material into Secretapoxy/apiz-cert. The fsnotify watcher then hot-reloads it in place (no pod restart). - Singleton: the renewer runs under a Kubernetes leader-election lease, so a future multi-replica install issues against cosmos from exactly one pod per cycle.
What you'll see when auto-rotation runs:
apoxy_kube_controller_cert_renewals_total{result="success"}increments by 1.apoxy_kube_controller_cert_expiry_secondsjumps to the newNotAfter.- A
Normal CertRenewedEvent lands on the kube-controller Deployment — visible viakubectl describe deploy kube-controller -n apoxy. - The previous cert stays valid in cosmos. Revocation of the displaced cert is operator-driven (
apoxy k8s certs revoke) — auto-renewal deliberately doesn't revoke, so a half-completed rotation doesn't lock the controller out of cosmos.
Disabling auto-rotation
Set APOXY_CERT_RENEW_INTERVAL=-1s on the kube-controller container if you need rotation to be operator-driven only (compliance audit windows, change-control freezes). The renewer logs Cert auto-renewal disabled by configuration once and exits cleanly; manual apoxy k8s certs rotate still works.
Monitoring auto-rotation
Wire the apoxy_kube_controller_cert_renewals_total{result="failure"} counter into your alerts. A non-zero rate means the controller can't reach cosmos or cosmos is rejecting the renewal. The live cert keeps working until expiry, so this is a warning, not a page — but you have until NotAfter to fix it. See the kube-controller metrics reference for recommended alerts.
Verify the rotation
After rotate exits successfully:
# 1. Confirm the Secret now holds the new cert.
apoxy k8s certs list --context <kube-context>
# 2. Confirm the controller pod is Ready and using the new cert.
kubectl --context <kube-context> -n apoxy get pods -l app=kube-controller
kubectl --context <kube-context> -n apoxy get deploy kube-controller \
-o jsonpath='{.spec.template.metadata.annotations.apoxy\.dev/cert-fingerprint}{"\n"}'
# 3. Confirm the aggregated API is healthy on a real request.
kubectl --context <kube-context> get gateways -AThe cert-fingerprint annotation on the pod template should match the new fingerprint printed by rotate. If the controller fails to start with the new cert, the rolling update halts and the old pod keeps serving — investigate before retrying.
Revoke the old cert
rotate deliberately leaves the old cert valid so you can fall back to it if anything looks wrong. Once you've confirmed the rotation took, revoke:
apoxy k8s certs revoke <OLD-FINGERPRINT> --user-jwt <jwt>Or run rotate with --revoke to revoke as part of the same flow once the rollout completes:
apoxy k8s certs rotate \
--context <kube-context> \
--revoke \
--user-jwt <jwt> \
--yesWhy revoke needs a user JWT
The Apoxy API rejects API-key auth on revoke. The user JWT comes from your dashboard session — log into dashboard.apoxy.dev, open the developer tools, and copy the bearer token Apoxy sends on its API calls (or use the dashboard's UI to expose it). Pass it via --user-jwt, set APOXY_USER_JWT, or write it to ~/.config/apoxy/user-jwt.
This is intentional: the API key sits in the Apoxy CLI config of every operator who can install the controller, and may also be present elsewhere in your infra. If a leaked API key could revoke the cert it was just used to issue, an attacker could trivially lock the controller out of the control plane. User auth ensures revoke happens with a human-tied credential.
Revocation propagates to the Apoxy ext_authz layer within ~30 seconds, after which any request presenting the revoked cert fails with 403.
Troubleshooting
Secret apoxy/apiz-cert changed mid-rotation
Another rotate started while this one was running. Wait for it to finish, then re-check the state with apoxy k8s certs list. The Secret update is gated on ResourceVersion to fail fast rather than silently overwrite.
Deployment kube-controller not found
The controller isn't installed in this cluster. Run apoxy k8s install first.
namespace apoxy is missing annotation apoxy.dev/cluster-name
The namespace was created some other way (or annotations were stripped). Either re-run apoxy k8s install --cluster-name <name> or set the annotation by hand:
kubectl annotate ns apoxy apoxy.dev/cluster-name=<name>Pod fails to become Ready after rotation
The most common cause is that the new cert was issued for a different project than the one the controller expected. Check:
-
The project the cluster was installed against — embedded in the in-cluster ConfigMap:
$terminalSHkubectl --context <kube-context> -n apoxy get cm kube-controller \ -o jsonpath='{.data.config\.yaml}' | grep currentProject -
The project your current CLI session is using —
currentProjectin~/.apoxy/config.yaml. If these don't match, re-select the right project in the CLI before runningrotateagain. -
The pod's logs (
kubectl -n apoxy logs deploy/kube-controller) — they show the cert load and the upstream connection attempt.
If the old pod is still healthy, you can roll back by deleting the new Secret content and waiting for the old pod's cached cert to expire — but in practice it's faster to fix the project mismatch and re-rotate.
Revoke fails with 401/403
Your user JWT is missing, expired, or doesn't carry the project membership. Re-fetch it from the dashboard. API keys are intentionally rejected on this endpoint; passing one will hit this error.