# @paperclipai/plugin-kubernetes (alpha) First-party Paperclip sandbox-provider plugin for Kubernetes. **Alpha:** the default backend (`sandbox-cr`) is built on `kubernetes-sigs/agent-sandbox` v1alpha1 — expect breaking changes as that CRD evolves toward Beta. A stable fallback backend (`job`, using `batch/v1` Job) is available for clusters without agent-sandbox installed, but it does NOT support multi-command exec (paperclip-server's adapter-install pattern requires sandbox-cr). ## Prerequisites ### For `sandbox-cr` backend (default, recommended) 1. A Kubernetes cluster running k8s 1.27+ 2. [`kubernetes-sigs/agent-sandbox`](https://github.com/kubernetes-sigs/agent-sandbox) controller installed in the cluster (alpha — installs the `sandboxes.agents.x-k8s.io/v1alpha1` CRD and controller) 3. Paperclip-server running with access to the cluster (in-cluster via `inCluster: true` or external via `kubeconfig`) ### For `job` backend (stable fallback) 1. A Kubernetes cluster running k8s 1.27+ 2. Paperclip-server with cluster access — no additional controllers or CRDs required ## Installation ```bash paperclipai plugin install @paperclipai/plugin-kubernetes ``` Or, for local development: ```bash paperclipai plugin install --local /path/to/paperclip/packages/plugins/sandbox-providers/kubernetes ``` ## Backends The plugin supports two backend modes, selected via the `backend` config field: | Backend | Default | Stability | Multi-command exec | Requires | |---|---|---|---|---| | `sandbox-cr` | Yes | Alpha | Yes | `kubernetes-sigs/agent-sandbox` controller | | `job` | No | Stable | No | Nothing beyond k8s 1.27+ | **`sandbox-cr` (default):** Creates a `Sandbox` CR (`agents.x-k8s.io/v1alpha1`) whose controller provisions a long-lived pod running `sleep infinity`. paperclip-server execs individual commands into the running pod — this is the multi-command adapter-install pattern. When you `releaseLease`, the Sandbox CR is deleted and the controller tears down the pod. **`job` (stable fallback):** Creates a `batch/v1` Job. The container entrypoint runs once and exits — no multi-command exec possible. Use this when you cannot install agent-sandbox, or when you need strictly stable Kubernetes APIs. Note: paperclip-server's adapter-install pattern will not work in job mode. ### Migrating from `job` to `sandbox-cr` 1. Install the agent-sandbox controller: `kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/latest/download/install.yaml` 2. Update your environment config to set `backend: "sandbox-cr"` (or remove `backend` since `sandbox-cr` is the default) 3. New leases will use the Sandbox CR backend. Existing leases created with `job` mode continue to use job semantics until they are released. ## Configuration Create a `sandbox` environment with `driver: kubernetes`. One of these auth fields is required: - `inCluster: true` — use the in-pod ServiceAccount credentials (when paperclip-server runs inside the same cluster). - `kubeconfig: ` — inline kubeconfig (stored as a company secret). - `kubeconfigSecretRef: ` — reference to an existing Paperclip secret. Common optional fields: | Field | Default | Purpose | |---|---|---| | `backend` | `"sandbox-cr"` | `sandbox-cr` (alpha, requires agent-sandbox controller) or `job` (stable, one-shot entrypoint). | | `adapterType` | `"claude_local"` | One of the supported adapter types (claude_local, codex_local, gemini_local, cursor_local, opencode_local, acpx_local, pi_local). Determines runtime image + env keys + egress allow-list. | | `namespacePrefix` | `"paperclip-"` | Prefix for the per-company tenant namespace. | | `paperclipServerNamespace` | `"paperclip"` | Namespace where paperclip-server pods run. Generated egress policies use this so agent pods can call back to the server. | | `companySlug` | derived from companyId | Override the auto-derived company slug. | | `imageRegistry` | (none) | Override the default registry for agent runtime images. | | `imageAllowList` | `[]` | Glob patterns of allowed `target.imageOverride` values. Empty = no override permitted. | | `imagePullSecrets` | `[]` | Names of pre-created Docker image pull secrets in the tenant namespace. | | `egressAllowFqdns` | `[]` | Additional FQDNs (beyond adapter defaults like `api.anthropic.com`). | | `egressAllowCidrs` | `[]` | Additional CIDRs to allow HTTPS egress to. CIDR egress is restricted to TCP port 443. | | `egressMode` | `"standard"` | `standard` (NetworkPolicy + CIDRs, plus public HTTPS fallback when adapter FQDNs are configured) or `cilium` (CiliumNetworkPolicy + exact FQDN allow-list). | | `runtimeClassName` | (none) | e.g. `kata-fc` for Firecracker-backed microVMs. Cluster must have the RuntimeClass installed. | | `serviceAccountAnnotations` | `{}` | Annotations applied to per-tenant ServiceAccount (e.g. IRSA `eks.amazonaws.com/role-arn`). | | `jobTtlSecondsAfterFinished` | `900` | Seconds after a Job completes before garbage-collection. | | `podActivityDeadlineSec` | `3600` | Hard ceiling on a single run's wall-clock time. | Full JSON Schema in `src/manifest.ts`. ## What gets created in your cluster For each company that runs agents (created lazily on first dispatch): ``` Namespace paperclip-{companySlug} (PSS: restricted enforce + audit) ServiceAccount paperclip-tenant-sa Role paperclip-tenant-role (only get pods/log) RoleBinding paperclip-tenant-rb ResourceQuota paperclip-quota (pods, requests/limits cpu+memory) LimitRange paperclip-limits (container max/min/default/defaultRequest) NetworkPolicy paperclip-deny-all (deny ingress + egress baseline) NetworkPolicy paperclip-egress-allow (DNS + paperclip-server callback + user CIDRs + public HTTPS fallback for adapter FQDNs) OR CiliumNetworkPolicy paperclip-egress-fqdn if egressMode=cilium ``` Standard Kubernetes NetworkPolicy cannot match FQDNs. In `egressMode: "standard"`, adapter-default FQDNs such as `api.anthropic.com` trigger a public IPv4 HTTPS fallback that excludes private and link-local ranges, so default agent runs can reach model APIs without opening intra-cluster/private-network egress. Use `egressMode: "cilium"` when you need exact FQDN enforcement. For each agent run (sandbox-cr backend): ``` Sandbox CR pc-{ulid} (agents.x-k8s.io/v1alpha1; explicit delete on release) Pod pc-{ulid}-{podSuffix} (managed by Sandbox controller; torn down on CR delete) Secret pc-{ulid}-env (owned by Sandbox CR; cascade-deleted) ``` ## Fast workspace uploads The `sandbox-cr` backend recognizes the chunked base64 upload protocol emitted by `@paperclipai/adapter-utils` for workspace, skill, and config-seed file transfers. Instead of running one Kubernetes exec per base64 chunk, the plugin buffers the upload in worker memory and flushes the final payload through a single `head -c | base64 -d` exec with stdin. The interceptor is intentionally narrow: only the exact `mkdir`/`printf`/`base64 -d` command shape generated by adapter-utils is optimized. Unknown commands, missing init state, or uploads over the 100 MB buffer cap fall back to normal exec behavior. For each agent run (job backend): ``` Job pc-{ulid} (backoffLimit: 0, ttlSecondsAfterFinished from config) Pod pc-{ulid}-{podSuffix} (owned by Job; cascade-deleted) Secret pc-{ulid}-env (owned by Job; cascade-deleted) ``` ## Security baseline Every agent pod is: - non-root (`runAsUser: 1000`, `runAsGroup: 1000`, `runAsNonRoot: true`) - drops ALL Linux capabilities, `allowPrivilegeEscalation: false` - `readOnlyRootFilesystem: true` with explicit `emptyDir` mounts for `/workspace`, `/home/paperclip`, `/home/paperclip/.cache`, `/tmp` - `seccompProfile: RuntimeDefault` - Tini as PID 1 (reaps zombies, forwards signals) - `fsGroupChangePolicy: OnRootMismatch` (fast PVC startup; openclaw-operator lesson) - `automountServiceAccountToken: false` Plus per-namespace `pod-security.kubernetes.io/enforce: restricted` and a deny-all NetworkPolicy baseline with explicit egress allow-list (DNS, paperclip-server, CIDRs, and either Cilium FQDN rules or standard-mode public HTTPS fallback). The per-run Secret carrying the bootstrap token and adapter API keys has `ownerReferences` pointing at the owning Sandbox CR or Job, so releasing the lease cascades cleanly to the Pod and Secret. ## Optional Kata-FC microVM isolation For stronger isolation, install [Kata Containers](https://github.com/kata-containers/kata-containers) with the Firecracker hypervisor, then set `runtimeClassName: kata-fc` in the plugin config. Each agent pod will run inside a Firecracker microVM. Requires nested-virt-capable nodes (bare-metal or specific cloud instance types). ## Roadmap - **Phase A (done):** `sandbox-cr` backend — multi-command exec via agent-sandbox Sandbox CRD. - **Phase B:** Warm pool support — pre-provisioned Sandbox CRs for sub-second cold starts. The `SandboxOrchestrator` interface reserves optional `pause?`/`resume?` extension slots. - **Phase C:** Kata-FC + snapshots — `runtimeClassName: kata-fc` with VM snapshot for fast restore. - **Phase D:** Contribute back to agent-sandbox upstream if their Beta model diverges from our needs. The `SandboxOrchestrator` interface (`src/sandbox-orchestrator.ts`) is the clean swap point — a new implementation can be added without touching `plugin.ts` business logic. ## Lessons learned (from openclaw-operator) This plugin adopts patterns from `openclaw-rocks/openclaw-operator`: - Tini PID 1 (issue #471 — zombie helper processes) - Read-only rootFS with explicit writable mounts (issue #456 — ~/.config not writable) - Strategic merge on reconcile (issue #446 — preserve third-party annotations) - Multi-storage-class testing (issue #448 — `local-path-provisioner` differences) - Image version compat matrix (issue #462 — runtime deps cannot resolve after upgrade) ## Development ```bash cd packages/plugins/sandbox-providers/kubernetes pnpm install --ignore-workspace pnpm test # unit tests only (fast) pnpm typecheck pnpm build ``` To run the kind-cluster integration test (requires `kubectl --context kind-paperclip` and a pre-loaded alpine image; see `test/integration/end-to-end-run.test.ts`): ```bash RUN_K8S_INTEGRATION_TESTS=1 pnpm test test/integration/end-to-end-run.test.ts ```