@paperclipai/plugin-kubernetes (alpha)
First-party Paperclip sandbox-provider plugin for Kubernetes.
Alpha: the default backend (sandbox-cr) is built on kubernetes-sigs/agent-sandbox v1alpha1 — expect breaking changes as that CRD evolves toward Beta. A stable fallback backend (job, using batch/v1 Job) is available for clusters without agent-sandbox installed, but it does NOT support multi-command exec (paperclip-server's adapter-install pattern requires sandbox-cr).
Prerequisites
For sandbox-cr backend (default, recommended)
- A Kubernetes cluster running k8s 1.27+
kubernetes-sigs/agent-sandboxcontroller installed in the cluster (alpha — installs thesandboxes.agents.x-k8s.io/v1alpha1CRD and controller)- Paperclip-server running with access to the cluster (in-cluster via
inCluster: trueor external viakubeconfig)
For job backend (stable fallback)
- A Kubernetes cluster running k8s 1.27+
- Paperclip-server with cluster access — no additional controllers or CRDs required
Installation
paperclipai plugin install @paperclipai/plugin-kubernetes
Or, for local development:
paperclipai plugin install --local /path/to/paperclip/packages/plugins/sandbox-providers/kubernetes
Backends
The plugin supports two backend modes, selected via the backend config field:
| Backend | Default | Stability | Multi-command exec | Requires |
|---|---|---|---|---|
sandbox-cr |
Yes | Alpha | Yes | kubernetes-sigs/agent-sandbox controller |
job |
No | Stable | No | Nothing beyond k8s 1.27+ |
sandbox-cr (default): Creates a Sandbox CR (agents.x-k8s.io/v1alpha1) whose controller provisions a long-lived pod running sleep infinity. paperclip-server execs individual commands into the running pod — this is the multi-command adapter-install pattern. When you releaseLease, the Sandbox CR is deleted and the controller tears down the pod.
job (stable fallback): Creates a batch/v1 Job. The container entrypoint runs once and exits — no multi-command exec possible. Use this when you cannot install agent-sandbox, or when you need strictly stable Kubernetes APIs. Note: paperclip-server's adapter-install pattern will not work in job mode.
Migrating from job to sandbox-cr
- Install the agent-sandbox controller:
kubectl apply -f https://github.com/kubernetes-sigs/agent-sandbox/releases/latest/download/install.yaml - Update your environment config to set
backend: "sandbox-cr"(or removebackendsincesandbox-cris the default) - New leases will use the Sandbox CR backend. Existing leases created with
jobmode continue to use job semantics until they are released.
Configuration
Create a sandbox environment with driver: kubernetes. One of these auth fields is required:
inCluster: true— use the in-pod ServiceAccount credentials (when paperclip-server runs inside the same cluster).kubeconfig: <YAML>— inline kubeconfig (stored as a company secret).kubeconfigSecretRef: <secret-uuid>— reference to an existing Paperclip secret.
Common optional fields:
| Field | Default | Purpose |
|---|---|---|
backend |
"sandbox-cr" |
sandbox-cr (alpha, requires agent-sandbox controller) or job (stable, one-shot entrypoint). |
adapterType |
"claude_local" |
One of the supported adapter types (claude_local, codex_local, gemini_local, cursor_local, opencode_local, acpx_local, pi_local). Determines runtime image + env keys + egress allow-list. |
namespacePrefix |
"paperclip-" |
Prefix for the per-company tenant namespace. |
paperclipServerNamespace |
"paperclip" |
Namespace where paperclip-server pods run. Generated egress policies use this so agent pods can call back to the server. |
companySlug |
derived from companyId | Override the auto-derived company slug. |
imageRegistry |
(none) | Override the default registry for agent runtime images. |
imageAllowList |
[] |
Glob patterns of allowed target.imageOverride values. Empty = no override permitted. |
imagePullSecrets |
[] |
Names of pre-created Docker image pull secrets in the tenant namespace. |
egressAllowFqdns |
[] |
Additional FQDNs (beyond adapter defaults like api.anthropic.com). |
egressAllowCidrs |
[] |
Additional CIDRs to allow HTTPS egress to. CIDR egress is restricted to TCP port 443. |
egressMode |
"standard" |
standard (NetworkPolicy + CIDRs, plus public HTTPS fallback when adapter FQDNs are configured) or cilium (CiliumNetworkPolicy + exact FQDN allow-list). |
runtimeClassName |
(none) | e.g. kata-fc for Firecracker-backed microVMs. Cluster must have the RuntimeClass installed. |
serviceAccountAnnotations |
{} |
Annotations applied to per-tenant ServiceAccount (e.g. IRSA eks.amazonaws.com/role-arn). |
jobTtlSecondsAfterFinished |
900 |
Seconds after a Job completes before garbage-collection. |
podActivityDeadlineSec |
3600 |
Hard ceiling on a single run's wall-clock time. |
Full JSON Schema in src/manifest.ts.
What gets created in your cluster
For each company that runs agents (created lazily on first dispatch):
Namespace paperclip-{companySlug} (PSS: restricted enforce + audit)
ServiceAccount paperclip-tenant-sa
Role paperclip-tenant-role (only get pods/log)
RoleBinding paperclip-tenant-rb
ResourceQuota paperclip-quota (pods, requests/limits cpu+memory)
LimitRange paperclip-limits (container max/min/default/defaultRequest)
NetworkPolicy paperclip-deny-all (deny ingress + egress baseline)
NetworkPolicy paperclip-egress-allow (DNS + paperclip-server callback + user CIDRs + public HTTPS fallback for adapter FQDNs)
OR CiliumNetworkPolicy paperclip-egress-fqdn if egressMode=cilium
Standard Kubernetes NetworkPolicy cannot match FQDNs. In egressMode: "standard", adapter-default FQDNs such as api.anthropic.com trigger a public IPv4 HTTPS fallback that excludes private and link-local ranges, so default agent runs can reach model APIs without opening intra-cluster/private-network egress. Use egressMode: "cilium" when you need exact FQDN enforcement.
For each agent run (sandbox-cr backend):
Sandbox CR pc-{ulid} (agents.x-k8s.io/v1alpha1; explicit delete on release)
Pod pc-{ulid}-{podSuffix} (managed by Sandbox controller; torn down on CR delete)
Secret pc-{ulid}-env (owned by Sandbox CR; cascade-deleted)
Fast workspace uploads
The sandbox-cr backend recognizes the chunked base64 upload protocol emitted by @paperclipai/adapter-utils for workspace, skill, and config-seed file transfers. Instead of running one Kubernetes exec per base64 chunk, the plugin buffers the upload in worker memory and flushes the final payload through a single head -c <bytes> | base64 -d exec with stdin.
The interceptor is intentionally narrow: only the exact mkdir/printf/base64 -d command shape generated by adapter-utils is optimized. Unknown commands and missing init state fall back to normal exec behavior. Uploads over the 100 MB buffer cap fail fast instead of falling back, because earlier chunks were already acknowledged without being written to the pod.
For each agent run (job backend):
Job pc-{ulid} (backoffLimit: 0, ttlSecondsAfterFinished from config)
Pod pc-{ulid}-{podSuffix} (owned by Job; cascade-deleted)
Secret pc-{ulid}-env (owned by Job; cascade-deleted)
Security baseline
Every agent pod is:
- non-root (
runAsUser: 1000,runAsGroup: 1000,runAsNonRoot: true) - drops ALL Linux capabilities,
allowPrivilegeEscalation: false readOnlyRootFilesystem: truewith explicitemptyDirmounts for/workspace,/home/paperclip,/home/paperclip/.cache,/tmpseccompProfile: RuntimeDefault- Tini as PID 1 (reaps zombies, forwards signals)
fsGroupChangePolicy: OnRootMismatch(fast PVC startup; openclaw-operator lesson)automountServiceAccountToken: false
Plus per-namespace pod-security.kubernetes.io/enforce: restricted and a deny-all NetworkPolicy baseline with explicit egress allow-list (DNS, paperclip-server, CIDRs, and either Cilium FQDN rules or standard-mode public HTTPS fallback).
The per-run Secret carrying the bootstrap token and adapter API keys has ownerReferences pointing at the owning Sandbox CR or Job, so releasing the lease cascades cleanly to the Pod and Secret.
Optional Kata-FC microVM isolation
For stronger isolation, install Kata Containers with the Firecracker hypervisor, then set runtimeClassName: kata-fc in the plugin config. Each agent pod will run inside a Firecracker microVM. Requires nested-virt-capable nodes (bare-metal or specific cloud instance types).
Roadmap
- Phase A (done):
sandbox-crbackend — multi-command exec via agent-sandbox Sandbox CRD. - Phase B: Warm pool support — pre-provisioned Sandbox CRs for sub-second cold starts. The
SandboxOrchestratorinterface reserves optionalpause?/resume?extension slots. - Phase C: Kata-FC + snapshots —
runtimeClassName: kata-fcwith VM snapshot for fast restore. - Phase D: Contribute back to agent-sandbox upstream if their Beta model diverges from our needs. The
SandboxOrchestratorinterface (src/sandbox-orchestrator.ts) is the clean swap point — a new implementation can be added without touchingplugin.tsbusiness logic.
Lessons learned (from openclaw-operator)
This plugin adopts patterns from openclaw-rocks/openclaw-operator:
- Tini PID 1 (issue #471 — zombie helper processes)
- Read-only rootFS with explicit writable mounts (issue #456 — ~/.config not writable)
- Strategic merge on reconcile (issue #446 — preserve third-party annotations)
- Multi-storage-class testing (issue #448 —
local-path-provisionerdifferences) - Image version compat matrix (issue #462 — runtime deps cannot resolve after upgrade)
Development
cd packages/plugins/sandbox-providers/kubernetes
pnpm install --ignore-workspace
pnpm test # unit tests only (fast)
pnpm typecheck
pnpm build
To run the kind-cluster integration test (requires kubectl --context kind-paperclip and a pre-loaded alpine image; see test/integration/end-to-end-run.test.ts):
RUN_K8S_INTEGRATION_TESTS=1 pnpm test test/integration/end-to-end-run.test.ts