# Lock down agent egress

> Default-deny outbound traffic. Allowlist the upstreams your agents actually need by hostname, CIDR, and port.

A prompt-injected or compromised agent will try to phone home. The
defense is to restrict what it can reach. CLRK's `EgressGateway`
defaults to `deny-all` - egress is closed unless you open it. This
guide is how you open exactly what you need and nothing else.

## Default is closed

`EgressGateway.spec.defaultPolicy` defaults to `deny-all`. Any
outbound destination that doesn't match an attached route is dropped
at the worker's dialer, before any connection establishes. You opt
into traffic, not out of it.

The example manifests in `_examples/` use `allow-all` because they're
demos. For anything past the demo, leave the default alone and add
routes for what the agent actually needs.

## Two policy primitives

```mermaid
flowchart LR
  EG["EgressGateway<br/>defaultPolicy: deny-all"] --> L["EgressL4Route<br/>(allowlist)"]
  EG --> DP["EgressDenyPolicy<br/>(invert an allow)"]
```

- **`EgressL4Route`** declares allowed L4 destinations:
  `destinationCIDRs`, `destinationHostnames`, `ports`, `protocol`
  (TCP or UDP). Optional `sourceAgents` label selector scopes the
  rule to specific agents.
- **`EgressDenyPolicy`** attaches to a route via `targetRef` and
  inverts it from allow to deny, with an HTTP-layer `denyResponse`
  (default 403, custom message optional).

For the default-deny + allowlist pattern, you'll mostly use
`EgressL4Route`. `EgressDenyPolicy` is for the inverse case: a
default-allow gateway where you want to punch specific holes shut.

## Recipe: default-deny + allowlist

This is the standard production shape. Allow only the upstreams the
agent needs:

```yaml
apiVersion: clrk.apoxy.dev/v1alpha1
kind: EgressGateway
metadata:
  name: prod-agents
spec:
  defaultPolicy: deny-all
  listeners:
    - name: tcp-out
      protocol: TCP
    - name: tls-out
      protocol: TLS
      tls:
        mode: Terminate
---
# Allow TLS to Anthropic.
apiVersion: clrk.apoxy.dev/v1alpha1
kind: EgressL4Route
metadata:
  name: anthropic-tls
spec:
  parentRefs:
    - group: clrk.apoxy.dev
      kind: EgressGateway
      name: prod-agents
      sectionName: tls-out
  rules:
    - matches:
        - destinationHostnames: ["api.anthropic.com"]
          ports: [{ port: 443 }]
          protocol: TCP
---
# Allow plain TCP to one Postgres.
apiVersion: clrk.apoxy.dev/v1alpha1
kind: EgressL4Route
metadata:
  name: app-postgres
spec:
  parentRefs:
    - group: clrk.apoxy.dev
      kind: EgressGateway
      name: prod-agents
      sectionName: tcp-out
  rules:
    - matches:
        - destinationHostnames: ["pg.prod.internal"]
          ports: [{ port: 5432 }]
          protocol: TCP
```

Anything not matched by these rules - DNS lookups to attacker
infrastructure, surprise outbound HTTPS to a paste site, an `nc` to a
hostile listener - is dropped at the worker.

## How hostnames get resolved

Hostname matching is real, not advisory, but only when the agent uses
plain DNS. The worker snoops UDP/53 responses and caches
`(resolved IP) → name` bindings. On connection, the dialer attaches
the snooped hostname to the connection's PROXY v2 frame so the
L4 routing layer can match on it. SNI on TLS listeners is read too,
but SNI is agent-supplied and an attacker can lie there.

**Recommendation**: have your agents use the kernel resolver (plain
UDP/53). Encrypted resolvers (DoT, DoH) bypass the snoop, which
means hostname rules degrade to whatever the agent declares via SNI
 - useful for the cooperative case, weak as a security boundary.

Even on a TLS-terminated listener (`mode: Terminate`), the
`EgressL4Route` hostname match still runs against the agent-supplied
SNI - decrypted Host / `:authority` matching is an L7 feature
(`AIProviderRoute` / the MCP route layer), not `EgressL4Route`. For a
hard egress boundary, pair the hostname rule with a
`destinationCIDRs` match, or move hostname enforcement up to the L7
route layer.

## Wildcards and CIDRs

```yaml
destinationHostnames:
  - "api.openai.com"        # exact
  - "*.azure.openai.com"    # single-label wildcard
destinationCIDRs:
  - "10.20.0.0/16"          # whole subnet
  - "203.0.113.5/32"        # single IP
ports:
  - port: 443
  - startPort: 8000         # inclusive range
    endPort: 8099
protocol: TCP
```

Wildcard semantics follow Gateway API's `Hostname`: `*.openai.com`
matches `api.openai.com` but **not** `eu.api.openai.com`. One prefix
label only. If you need multi-label wildcards, list them explicitly.

CIDRs match IP only - no DNS involved. Useful for "allow this
internal VPC range" rules.

## What a denial looks like

A denied L4 connection is refused at the worker before any backend
is selected or any data is spliced - the connection never reaches
Envoy or the upstream. Two things to know:

- **The worker emits a dedicated deny record.** A denied connection
  produces no L4 ext_proc allow-record (it never reaches Envoy), but
  the worker DOES publish an `egress.dial.denied` OTLP span and log
  record carrying `clrk.egress.deny_reason=policy` (plus `agent.name`,
  `clrk.dst.name`, and the peer address/port). Query on
  `clrk.egress.deny_reason` to find denials directly - don't rely on
  record-absence.
- **The agent sees a TCP connection failure.** Whatever your
  language's socket library reports for `connect()` returning EOF or
  ECONNREFUSED.

If you want a friendlier denial - say, a 403 with a custom message
on the L7 side - use `EgressDenyPolicy` against a specific allowed
route to flip it. The `denyResponse` is HTTP-shaped (status + body),
so it only applies to L7 routes.

## Three canned allowlists

**Agent that only talks to Anthropic**:

```yaml
- destinationHostnames: ["api.anthropic.com"]
  ports: [{ port: 443 }]
  protocol: TCP
```

**Agent with customer data**:

```yaml
- destinationHostnames: ["api.anthropic.com", "s3.us-east-1.amazonaws.com"]
  ports: [{ port: 443 }]
  protocol: TCP
- destinationHostnames: ["pg.prod.internal"]
  ports: [{ port: 5432 }]
  protocol: TCP
```

**Agent that only reaches cluster-internal services**:

```yaml
- destinationHostnames: ["*.svc.cluster.local"]
  protocol: TCP
- destinationCIDRs: ["10.0.0.0/8"]
  protocol: TCP
```

## What this does NOT do

- **Does not inspect HTTPS request bodies.** L4 is a destination-only
  matcher. For body inspection use [`AIProviderRoute`
  filters](./hide-credentials-from-agents) and the MCP route layer.
- **Does not prevent leakage to allowed hosts.** If you allow Slack,
  an agent can DM the attacker via Slack. Hostname allowlists buy
  you a lot but they don't replace output review.
- **Does not bound bandwidth.** No bytes-per-second policy today - 
  coming soon. Talk to us if you need bandwidth caps before then.

## Where to next

- Pair the allowlist with credential injection so the agent doesn't
  even need to know the key - see [Hide credentials from
  agents](./hide-credentials-from-agents).
- Confirm denials by querying the worker's `egress.dial.denied`
  records (`clrk.egress.deny_reason=policy`), and confirm allowed
  traffic by reading its OTLP records - see [Trace requests through
  agents](./trace-requests-through-agents).
- Cap LLM-side cost on the upstreams you do allow - see [Cap LLM
  spend per agent](./cap-llm-spend-per-agent).
