Alpha: the default backend (sandbox-cr) is built on kubernetes-sigs/agent-sandbox v1alpha1 — expect breaking changes as that CRD evolves toward Beta. A stable fallback backend (job, using batch/v1 Job) is available for clusters without agent-sandbox installed, but it does NOT support multi-command exec (paperclip-server's adapter-install pattern requires sandbox-cr).

Prerequisites

For `sandbox-cr` backend (default, recommended)

A Kubernetes cluster running k8s 1.27+
kubernetes-sigs/agent-sandbox controller installed in the cluster (alpha — installs the sandboxes.agents.x-k8s.io/v1alpha1 CRD and controller)
Paperclip-server running with access to the cluster (in-cluster via inCluster: true or external via kubeconfig)

For `job` backend (stable fallback)

A Kubernetes cluster running k8s 1.27+
Paperclip-server with cluster access — no additional controllers or CRDs required

Installation

paperclipai plugin install @paperclipai/plugin-kubernetes

Or, for local development:

paperclipai plugin install --local /path/to/paperclip/packages/plugins/sandbox-providers/kubernetes

Backends

The plugin supports two backend modes, selected via the backend config field:

Backend	Default	Stability	Multi-command exec	Requires
`sandbox-cr`	Yes	Alpha	Yes	`kubernetes-sigs/agent-sandbox` controller
`job`	No	Stable	No	Nothing beyond k8s 1.27+

sandbox-cr (default): Creates a Sandbox CR (agents.x-k8s.io/v1alpha1) whose controller provisions a long-lived pod running sleep infinity. paperclip-server execs individual commands into the running pod — this is the multi-command adapter-install pattern. When you releaseLease, the Sandbox CR is deleted and the controller tears down the pod.

job (stable fallback): Creates a batch/v1 Job. The container entrypoint runs once and exits — no multi-command exec possible. Use this when you cannot install agent-sandbox, or when you need strictly stable Kubernetes APIs. Note: paperclip-server's adapter-install pattern will not work in job mode.

Migrating from `job` to `sandbox-cr`

Install the agent-sandbox controller: kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/latest/download/install.yaml
Update your environment config to set backend: "sandbox-cr" (or remove backend since sandbox-cr is the default)
New leases will use the Sandbox CR backend. Existing leases created with job mode continue to use job semantics until they are released.

Configuration

Create a sandbox environment with driver: kubernetes. One of these auth fields is required:

inCluster: true — use the in-pod ServiceAccount credentials (when paperclip-server runs inside the same cluster).
kubeconfig: <YAML> — inline kubeconfig (stored as a company secret).
kubeconfigSecretRef: <secret-uuid> — reference to an existing Paperclip secret.

Common optional fields:

Field	Default	Purpose
`backend`	`"sandbox-cr"`	`sandbox-cr` (alpha, requires agent-sandbox controller) or `job` (stable, one-shot entrypoint).
`adapterType`	`"claude_local"`	One of the supported adapter types (claude_local, codex_local, gemini_local, cursor_local, opencode_local, acpx_local, pi_local). Determines runtime image + env keys + egress allow-list.
`namespacePrefix`	`"paperclip-"`	Prefix for the per-company tenant namespace.
`paperclipServerNamespace`	`"paperclip"`	Namespace where paperclip-server pods run. Generated egress policies use this so agent pods can call back to the server.
`companySlug`	derived from companyId	Override the auto-derived company slug.
`imageRegistry`	(none)	Override the default registry for agent runtime images.
`imageAllowList`	`[]`	Glob patterns of allowed `target.imageOverride` values. Empty = no override permitted.
`imagePullSecrets`	`[]`	Names of pre-created Docker image pull secrets in the tenant namespace.
`egressAllowFqdns`	`[]`	Additional FQDNs (beyond adapter defaults like `api.anthropic.com`).
`egressAllowCidrs`	`[]`	Additional CIDRs to allow HTTPS egress to. CIDR egress is restricted to TCP port 443.
`egressMode`	`"standard"`	`standard` (NetworkPolicy + CIDRs, plus public HTTPS fallback when adapter FQDNs are configured) or `cilium` (CiliumNetworkPolicy + exact FQDN allow-list).
`runtimeClassName`	(none)	e.g. `kata-fc` for Firecracker-backed microVMs. Cluster must have the RuntimeClass installed.
`serviceAccountAnnotations`	`{}`	Annotations applied to per-tenant ServiceAccount (e.g. IRSA `eks.amazonaws.com/role-arn`).
`jobTtlSecondsAfterFinished`	`900`	Seconds after a Job completes before garbage-collection.
`podActivityDeadlineSec`	`3600`	Hard ceiling on a single run's wall-clock time.

Full JSON Schema in src/manifest.ts.

What gets created in your cluster

For each company that runs agents (created lazily on first dispatch):

Namespace          paperclip-{companySlug}        (PSS: restricted enforce + audit)
ServiceAccount     paperclip-tenant-sa
Role               paperclip-tenant-role          (only get pods/log)
RoleBinding        paperclip-tenant-rb
ResourceQuota      paperclip-quota                (pods, requests/limits cpu+memory)
LimitRange         paperclip-limits               (container max/min/default/defaultRequest)
NetworkPolicy      paperclip-deny-all             (deny ingress + egress baseline)
NetworkPolicy      paperclip-egress-allow         (DNS + paperclip-server callback + user CIDRs + public HTTPS fallback for adapter FQDNs)
                   OR CiliumNetworkPolicy paperclip-egress-fqdn if egressMode=cilium

Standard Kubernetes NetworkPolicy cannot match FQDNs. In egressMode: "standard", adapter-default FQDNs such as api.anthropic.com trigger a public IPv4 HTTPS fallback that excludes private and link-local ranges, so default agent runs can reach model APIs without opening intra-cluster/private-network egress. Use egressMode: "cilium" when you need exact FQDN enforcement.

For each agent run (sandbox-cr backend):

Sandbox CR         pc-{ulid}                       (agents.x-k8s.io/v1alpha1; explicit delete on release)
Pod                pc-{ulid}-{podSuffix}           (managed by Sandbox controller; torn down on CR delete)
Secret             pc-{ulid}-env                   (owned by Sandbox CR; cascade-deleted)

Fast workspace uploads

The sandbox-cr backend recognizes the chunked base64 upload protocol emitted by @paperclipai/adapter-utils for workspace, skill, and config-seed file transfers. Instead of running one Kubernetes exec per base64 chunk, the plugin buffers the upload in worker memory and flushes the final payload through a single head -c <bytes> | base64 -d exec with stdin.

The interceptor is intentionally narrow: only the exact mkdir/printf/base64 -d command shape generated by adapter-utils is optimized. Unknown commands and missing init state fall back to normal exec behavior. Uploads over the 100 MB buffer cap fail fast instead of falling back, because earlier chunks were already acknowledged without being written to the pod.

For each agent run (job backend):

Job                pc-{ulid}                       (backoffLimit: 0, ttlSecondsAfterFinished from config)
Pod                pc-{ulid}-{podSuffix}           (owned by Job; cascade-deleted)
Secret             pc-{ulid}-env                   (owned by Job; cascade-deleted)

Security baseline

Every agent pod is:

non-root (runAsUser: 1000, runAsGroup: 1000, runAsNonRoot: true)
drops ALL Linux capabilities, allowPrivilegeEscalation: false
readOnlyRootFilesystem: true with explicit emptyDir mounts for /workspace, /home/paperclip, /home/paperclip/.cache, /tmp
seccompProfile: RuntimeDefault
Tini as PID 1 (reaps zombies, forwards signals)
fsGroupChangePolicy: OnRootMismatch (fast PVC startup; openclaw-operator lesson)
automountServiceAccountToken: false

Plus per-namespace pod-security.kubernetes.io/enforce: restricted and a deny-all NetworkPolicy baseline with explicit egress allow-list (DNS, paperclip-server, CIDRs, and either Cilium FQDN rules or standard-mode public HTTPS fallback).

The per-run Secret carrying the bootstrap token and adapter API keys has ownerReferences pointing at the owning Sandbox CR or Job, so releasing the lease cascades cleanly to the Pod and Secret.

Optional Kata-FC microVM isolation

For stronger isolation, install Kata Containers with the Firecracker hypervisor, then set runtimeClassName: kata-fc in the plugin config. Each agent pod will run inside a Firecracker microVM. Requires nested-virt-capable nodes (bare-metal or specific cloud instance types).

Roadmap

Phase A (done): sandbox-cr backend — multi-command exec via agent-sandbox Sandbox CRD.
Phase B: Warm pool support — pre-provisioned Sandbox CRs for sub-second cold starts. The SandboxOrchestrator interface reserves optional pause?/resume? extension slots.
Phase C: Kata-FC + snapshots — runtimeClassName: kata-fc with VM snapshot for fast restore.
Phase D: Contribute back to agent-sandbox upstream if their Beta model diverges from our needs. The SandboxOrchestrator interface (src/sandbox-orchestrator.ts) is the clean swap point — a new implementation can be added without touching plugin.ts business logic.

Lessons learned (from openclaw-operator)

This plugin adopts patterns from openclaw-rocks/openclaw-operator:

Tini PID 1 (issue #471 — zombie helper processes)
Read-only rootFS with explicit writable mounts (issue #456 — ~/.config not writable)
Strategic merge on reconcile (issue #446 — preserve third-party annotations)
Multi-storage-class testing (issue #448 — local-path-provisioner differences)
Image version compat matrix (issue #462 — runtime deps cannot resolve after upgrade)

Development

cd packages/plugins/sandbox-providers/kubernetes
pnpm install --ignore-workspace
pnpm test           # unit tests only (fast)
pnpm typecheck
pnpm build

To run the kind-cluster integration test (requires kubectl --context kind-paperclip and a pre-loaded alpine image; see test/integration/end-to-end-run.test.ts):

RUN_K8S_INTEGRATION_TESTS=1 pnpm test test/integration/end-to-end-run.test.ts

README.md

@paperclipai/plugin-kubernetes (alpha)