# Package a custom agent

> Turn a Python, Node, or shell script into a CLRK agent: choose the right kind, honor the I/O contract, build an image, write the manifest, invoke.

You have a script that does work - analyzes a payload, calls an API,
returns a result. This guide is the contract you adapt it to so the
script becomes a CLRK agent. The example agent is a deliberately
trivial Python word-counter so the focus stays on the plumbing.

## Decide first: TaskAgent or DaemonAgent

| Pick                | When                                                                                   |
|---------------------|----------------------------------------------------------------------------------------|
| **TaskAgent**       | A request, a cron, a webhook, or any external trigger fires the work. One run per fire. |
| **DaemonAgent**     | A long-lived loop or watcher. The process stays up; restarts on exit per `restartPolicy`. |

When in doubt, pick `TaskAgent`. Cold-start cost is real (10-30s
without a warm pool) but the request model composes uniformly with
HTTP ingress, cron triggers, attribution, OTLP, credential injection,
and budgets. `DaemonAgent` is the right call only when you genuinely
need a process that outlives requests.

`_examples/echo-bot` and `_examples/openai-bot` are `DaemonAgent`s.
`_examples/jq-bot` and `_examples/cron-bot` are `TaskAgent`s.

## The I/O contract

### Input (TaskAgent)

By default, the dispatcher writes a CloudEvents structured-mode JSON
envelope to your agent's stdin. The caller's body lives under `.data`:

```json
{
  "specversion": "1.0",
  "id": "a1b2c3d4-...",
  "source": "clrk://default/word-count",
  "type": "dev.apoxy.clrk.taskagent.invoke",
  "subject": "http",
  "time": "2026-05-19T03:22:50Z",
  "datacontenttype": "application/json",
  "data": { "text": "hello world from clrk" }
}
```

`source` is `clrk://<namespace>/<name>`, `id` is the
`X-Clrk-Execution-ID` (else Envoy's `x-request-id`, else a generated
UUID), and `subject` is the trigger type from the `X-Clrk-Trigger`
header (e.g. `http`, `cron`) - it is omitted entirely when no trigger
header is sent.

For cron-triggered runs, `.data` is your `spec.scheduleInput`
verbatim. Either way, `jq '.data'` on stdin lifts out the payload.

If your agent prefers HTTP over stdin, set `spec.delivery.mode:
Metadata`. The dispatcher then closes stdin and the agent fetches the
request from `$CLRK_METADATA_URL/event` and POSTs its reply to
`$CLRK_METADATA_URL/response`. `CLRK_METADATA_URL` already ends in
`/v1`, so do not prepend it again. Useful for runtimes that don't read
stdin gracefully.

### Input (DaemonAgent)

There is no stdin envelope - the agent is launched once and runs
until it exits. Whatever input it needs comes from
`spec.template.spec.env`, files baked into the image, or egress calls.

### Output

The agent's stdout becomes the HTTP response body for TaskAgents.
Stderr is captured for logs but does not reach the caller. Set
`Content-Type` from your client side or just write JSON and let the
caller interpret.

### Exit code and timeout

Exit 0 on success. Nonzero on a TaskAgent surfaces as a 5xx-class
response to the caller. On a DaemonAgent the supervisor reacts per
`spec.restartPolicy`.

`spec.timeout` (default 100s) caps wall-clock per execution. The
ingress HTTPRoute timeout is pinned to this value, so the cap holds
end-to-end. Cold-start eats some of that - size accordingly.

## What env vars the agent sees

CLRK provides a small, well-defined set. Everything else is what you
explicitly declare under `spec.template.spec.env`.

- `CLRK_METADATA_URL` - base URL of the sandbox-local metadata HTTP
  service. IPv4. Use this when `spec.delivery.mode: Metadata` is set,
  or to read the request envelope from a long-running process.
- `CLRK_METADATA_URL_V6` - same, IPv6.
- `PATH` and standard CA-trust paths so HTTPS works.
- Whatever you put under `spec.template.spec.env` with a literal
  `value:`.

What's not visible:

- Dockerfile `ENV` declarations are stripped. The placeholder pattern
  you'll see in examples (`ENV ANTHROPIC_API_KEY=clrk-injected-by-proxy`)
  only matters at image-build time for CLIs that refuse to start
  without the variable. At runtime, the manifest's `env:` block is
  authoritative.
- `spec.template.spec.env[].valueFrom.secretKeyRef` is silently
  dropped today. Do not rely on it. Surface secrets via [credential
  injection](./hide-credentials-from-agents) on the egress side.
- The image's `CMD` is ignored. The launcher uses the image's
  `ENTRYPOINT` only, and only when `spec.template.spec.command` is
  unset. If you need a different entry, set `spec.template.spec.command`
  and `spec.template.spec.args` explicitly.

## Write the agent

`agent.py`:

```python
#!/usr/bin/env python3
import json, os, sys

envelope = json.load(sys.stdin)
payload = envelope.get("data", {})
text = payload.get("text", "")

print(json.dumps({
    "word_count": len(text.split()),
    "envelope_id": envelope.get("id"),
}), flush=True)
```

`flush=True` matters - buffered stdout will look like empty responses
to the caller. Three lines: load envelope, do work, write JSON.

## Write the Dockerfile

```dockerfile
FROM python:3.12-slim

# Standard CA trust path - clrk relies on the image to ship CA certs.
# python:3.12-slim already includes ca-certificates.

COPY agent.py /agent.py

ENTRYPOINT ["python", "/agent.py"]
```

Build multi-arch and push:

```bash
docker buildx build \
  --platform=linux/amd64,linux/arm64 \
  -t <your-registry>/word-count:0.1 --push .
```

Worker pools pull whatever architecture they run on. Cover both if
you're not certain.

## Write the manifest

```yaml
apiVersion: clrk.apoxy.dev/v1alpha1
kind: TaskAgent
metadata:
  name: word-count
spec:
  workerPoolRef: default
  # Caller-visible deadline. Pinned end-to-end through the ingress
  # HTTPRoute. Default 100s; leave headroom for cold start.
  timeout: 60s
  template:
    spec:
      image: <your-registry>/word-count:0.1
      # secretKeyRef is a no-op today - use literal value for
      # config, credential injection for secrets.
      env:
        - name: GREETING
          value: "hi"
```

`metadata.name` is the Gateway name CLRK materializes for HTTP
ingress; the data-plane Service is `clrk-<name>` (here
`clrk-word-count`) in the `clrk` namespace. Keep the name DNS-safe.

## Apply and invoke

```bash
export KUBECONFIG=~/.clrk/kubeconfig.host

clrk apply -f taskagent.yaml --local
kubectl get gateway word-count   # wait for PROGRAMMED=True

kubectl port-forward -n clrk svc/clrk-word-count 18080:80 &

curl -sS http://localhost:18080/ \
  -H 'X-Clrk-TaskAgent: default/word-count' \
  -H 'X-Clrk-Trigger: http' \
  -H 'content-type: application/json' \
  --data '{"text":"hello world from clrk"}'
# {"word_count": 4, "envelope_id": "a1b2c3d4-..."}
```

## Iterating without spinning down

The fastest dev loop is to tag images with a content hash and re-apply
the manifest. The applied change triggers a new
`AgentSandboxRevision`, the worker pulls, and the next request lands
on the new sandbox:

```bash
TAG=$(git rev-parse --short HEAD)
docker buildx build --platform=linux/amd64 \
  -t <your-registry>/word-count:$TAG --push .
sed -i.bak "s|word-count:.*|word-count:$TAG|" taskagent.yaml
clrk apply -f taskagent.yaml --local
```

Tagging with `:latest` will look like a no-op to the apply diff.

## Common failure modes

- **Caller sees a gateway timeout.** Cold-start exceeded
  `spec.timeout`. Bump it, or set `spec.warmPoolSize` to keep a
  ready sandbox.
- **Caller sees an empty body but 200 OK.** Your script wrote to
  stderr instead of stdout, or buffered stdout without flushing.
  Python: `print(..., flush=True)`. Shell: ensure the last line
  writes to fd 1, not fd 2.
- **TLS calls inside the sandbox fail with "certificate verify failed".**
  Your base image stripped the CA bundle. Don't use `FROM scratch`;
  use a base that ships CA certs (Alpine via `apk add ca-certificates`,
  Debian/Ubuntu slim already includes them).
- **The sandbox exits immediately.** Your script raised at startup.
  Agent stdout/stderr is captured by the worker. In `clrk dev`, the
  worker pane carries the agent's stderr (or run `clrk agents logs
  <name>`); in prod, the agent's sandbox logs are aggregated and
  readable via `clrk agents logs <name>`.
- **Env var from `valueFrom.secretKeyRef` is missing.** That field
  is silently dropped today. Use literal `value` for non-secrets and
  [credential injection](./hide-credentials-from-agents) for secrets.

## Where to next

- Hide an API key the agent calls out to - see [Hide credentials from
  agents](./hide-credentials-from-agents).
- Restrict outbound destinations to an allowlist - see [Lock down
  agent egress](./lock-down-agent-egress).
- Trigger this agent on a schedule - see [Schedule recurring
  agents](./schedule-recurring-agents).
- Wire OTLP into your observability stack - see [Send telemetry to
  OTLP endpoints](./send-telemetry-to-otlp).
