diff --git a/doc/plans/2026-04-08-agent-os-technical-report.md b/doc/plans/2026-04-08-agent-os-technical-report.md new file mode 100644 index 00000000..923cf7d2 --- /dev/null +++ b/doc/plans/2026-04-08-agent-os-technical-report.md @@ -0,0 +1,397 @@ +# Agent OS Technical Report for Paperclip + +Date: 2026-04-08 +Analyzed upstream: `rivet-dev/agent-os` at commit `0063cdccd1dcb1c8e211670cd05482d70d26a5c4` (`0063cdc`), dated 2026-04-06 + +## Executive summary + +`agent-os` is not a competitor to Paperclip's core product. It is an execution substrate: an embedded, VM-like runtime for agents, tools, filesystems, and session orchestration. Paperclip is a control plane: company scoping, task hierarchy, approvals, budgets, activity logs, workspaces, and governance. + +The strongest takeaway is not "copy agent-os wholesale." The strongest takeaway is that Paperclip could selectively use its runtime ideas to improve local agent execution safety, reproducibility, and portability while keeping all company/task/governance logic in Paperclip. + +My recommendation is: + +1. Do not merge agent-os concepts into the Paperclip core product model. +2. Do evaluate an optional `agentos_local` execution adapter or internal runtime experiment. +3. Borrow a few design patterns aggressively: + - layered/snapshotted execution filesystems + - explicit capability-based runtime permissions + - a better host-tools bridge for controlled tool execution + - a normalized session capability model for agent adapters +4. Do not import its workflow/cron/queue abstractions into Paperclip core until they are reconciled with Paperclip's issue/comment/governance model. + +## What agent-os actually is + +From the repo layout and implementation, `agent-os` is a mixed TypeScript/Rust system that provides: + +- an `AgentOs` TypeScript API for creating isolated agent VMs +- a Rust kernel/sidecar that virtualizes filesystem, processes, PTYs, pipes, permissions, and networking +- an ACP-based session model for agent runtimes such as Pi, OpenCode, and Claude-style adapters +- a registry of WASM command packages and mount plugins +- optional host toolkits, cron scheduling, and filesystem mounts + +The repo is substantial already: + +- monorepo with `packages/`, `crates/`, and `registry/` +- roughly 1,200 files just across `packages/`, `crates/`, and `registry/` +- mixed implementation model: TypeScript public API plus Rust kernel/sidecar internals + +## Architecture notes + +### 1. Public runtime surface + +The main API lives in `packages/core/src/agent-os.ts` and exports an `AgentOs` class with methods such as: + +- `create()` +- `createSession()` +- `prompt()` +- `exec()` +- `spawn()` +- `snapshotRootFilesystem()` +- cron scheduling helpers + +This is an execution API, not a coordination API. + +### 2. Virtualized kernel model + +The kernel is implemented in Rust under `crates/kernel/src/`. It models: + +- virtual filesystem +- process table +- PTYs and pipes +- resource accounting +- permissioned filesystem access +- network permission checks + +That gives `agent-os` a much stronger isolation story than Paperclip's current "launch a host CLI in a workspace" local adapter approach. + +### 3. Layered filesystem and snapshots + +The filesystem design is one of the most reusable ideas. `agent-os` uses: + +- a bundled base filesystem +- a writable overlay +- optional mounted filesystems +- snapshot export/import for reusing root states + +This is cleaner than treating every execution workspace as a mutable checkout plus ad hoc cleanup. It enables reproducible starting states and cheap isolation. + +### 4. Capability-based permissions + +The kernel-level permission vocabulary is strong and concrete: + +- filesystem operations +- network operations +- child-process execution +- environment access + +The Rust kernel defaults are deny-oriented, but the high-level JS API currently serializes permissive defaults unless the caller provides a policy. That is an important nuance: the primitive is security-minded, but the product surface is still convenience-first. + +### 5. Host-tools bridge + +`agent-os` exposes host-side tools via a toolkit abstraction (`hostTool`, `toolKit`) and a local RPC bridge. This is a strong pattern because it gives the agent explicit, typed tools rather than ambient shell access to everything on the host. + +### 6. ACP session abstraction + +The session model is more uniform than most agent wrappers. It includes: + +- capabilities +- mode/config options +- permission requests +- sequenced session events +- JSON-RPC transport through ACP adapters + +This is directly relevant to Paperclip because our adapter layer still normalizes each CLI agent in a fairly bespoke way. + +## Paperclip anchor points + +The most relevant current Paperclip surfaces for any future `agent-os` integration are: + +- `packages/adapter-utils/src/types.ts` + - shared adapter contract, session metadata, runtime service reporting, environment tests, and optional `detectModel()` +- `server/src/services/heartbeat.ts` + - heartbeat execution, adapter invocation, cost capture, workspace realization, and issue-comment summaries +- `server/src/services/execution-workspaces.ts` + - execution workspace lifecycle and git readiness/cleanup logic +- `server/src/services/plugin-loader.ts` + - dynamic plugin activation, host capability boundaries, and runtime extension loading +- local adapters such as `packages/adapters/codex-local/src/server/execute.ts` and peers + - current host-CLI execution model that an `agent-os` runtime experiment would complement or replace for selected agents + +## What Paperclip can learn from it + +### 1. A safer local execution substrate + +Paperclip's local adapters currently run host CLIs in managed workspaces and rely on adapter-specific behavior plus process-level controls. That is pragmatic, but weakly isolated. + +`agent-os` shows a path toward: + +- running local agent tooling in a constrained runtime +- applying explicit network/filesystem/env policies +- reducing accidental host leakage +- making adapter behavior more portable across machines + +Best use in Paperclip: + +- as an optional runtime beneath local adapters +- or as a new adapter family for agents that can run inside ACP-compatible `agent-os` sessions + +This fits Paperclip because it improves execution safety without changing the control-plane model. + +### 2. Snapshotted execution roots instead of only mutable workspaces + +Paperclip already has strong execution-workspace concepts, but they are repo/worktree-centric. `agent-os` adds a stronger "start from known lower layers, write into a disposable upper layer" model. + +That could improve: + +- reproducible issue starts +- disposable task sandboxes +- faster reset/cleanup +- "resume from snapshot" behavior for recurring routines +- safe preview environments for risky agent operations + +This is especially interesting for tasks that do not need a full git worktree. + +### 3. A capability vocabulary for runtime governance + +Paperclip has governance at the company/task level: + +- approvals +- budgets +- activity logs +- actor permissions +- company scoping + +It has less structure at the runtime capability level. `agent-os` offers a clear vocabulary that Paperclip could adopt even without adopting the runtime itself: + +- `fs.read`, `fs.write`, `fs.mount_sensitive` +- `network.fetch`, `network.http`, `network.listen`, `network.dns` +- child process execution +- env access + +That vocabulary would improve: + +- adapter configuration schemas +- policy UIs +- execution review surfaces +- future approval gates for governed actions + +### 4. Typed host tools instead of shelling out for everything + +Paperclip's plugin system and adapters already have the beginnings of a controlled extension surface. `agent-os` reinforces the value of exposing capabilities as typed tools rather than raw shell access. + +Concrete Paperclip uses: + +- board-approved toolkits for sensitive operations +- company-scoped service tools +- plugin-defined tools with explicit schemas +- safer execution for common actions like git metadata inspection, preview lookups, deployment status checks, or document generation + +This aligns well with Paperclip's governance story. + +### 5. Better adapter normalization around sessions and capabilities + +Paperclip's adapter contract already supports execution results, session params, environment tests, skill syncing, quota windows, and optional `detectModel()`. But much of the per-agent behavior is still adapter-specific. + +`agent-os` suggests a cleaner normalization target: + +- a standard capability map +- a consistent event stream model +- explicit mode/config surfaces +- explicit permission request semantics + +Paperclip does not need ACP everywhere, but it would benefit from a more formal internal session capability model inspired by this. + +### 6. On-demand heavy sandbox escalation + +One of the best architectural choices in `agent-os` is that it does not pretend every workload fits the lightweight runtime. It has a sandbox extension for workloads that need a fuller environment. + +Paperclip can adopt that philosophy directly: + +- lightweight execution by default +- escalate to full worktree / container / remote sandbox only when needed +- keep the escalation explicit in the issue/run model + +That is better than forcing all tasks into the heaviest environment up front. + +## What does not fit Paperclip well + +### 1. Its built-in orchestration primitives overlap the wrong layer + +`agent-os` includes cron/session/workflow style primitives inside the runtime package. Paperclip already has higher-level orchestration concepts: + +- issues/comments +- heartbeat runs +- approvals +- company/org structure +- execution workspaces +- budget enforcement + +If Paperclip copied `agent-os` cron/workflow/queue ideas directly into core, we would likely duplicate orchestration across two layers. That would blur ownership and make debugging harder. + +Paperclip should keep orchestration authoritative at the control-plane layer. + +### 2. It is not company-scoped or governance-native + +`agent-os` is runtime-first, not company-first. It has no native concepts for: + +- company boundaries +- board/operator actor types +- audit logs for business actions +- issue hierarchy +- approval routing +- budget hard-stop behavior + +Those are Paperclip's differentiators. They should not be displaced by runtime abstractions. + +### 3. It introduces meaningful implementation complexity + +Adopting `agent-os` deeply would add: + +- Rust build/runtime complexity +- sidecar lifecycle management +- new failure modes across JS/Rust boundaries +- more packaging and platform compatibility work +- another abstraction layer for debugging already-complex local adapters + +This is justified only if we want stronger local isolation or portability. It is not justified as a general refactor. + +### 4. Its security model is not a drop-in governance solution + +The permission model is good, but it is low-level. Paperclip would still need to answer: + +- who can authorize a capability +- how approval decisions are logged +- how policies are scoped by company/project/issue/agent +- how runtime permissions interact with budgets and task status + +In other words, `agent-os` can supply enforcement primitives, not the control policy system itself. + +### 5. The agent compatibility story is still selective + +The repo is explicit that some runtimes are planned, partial, or still being adapted. In practice this means: + +- good ideas for ACP-native or compatible agents +- less certainty for every CLI agent we support today +- real integration work for Codex/Cursor/Gemini-style Paperclip adapters + +So the main near-term value is not universal replacement. It is selective use where compatibility is strong. + +## Concrete recommendations for Paperclip + +### Recommendation A: prototype an optional `agentos_local` adapter + +This is the highest-value experiment. + +Goal: + +- run one supported agent type inside `agent-os` +- keep Paperclip heartbeat/task/workspace/budget logic unchanged +- evaluate startup time, isolation, transcript quality, and operational complexity + +Good first target: + +- `pi_local` or `opencode_local` + +Why not start with Codex: + +- Paperclip's Codex adapter is already important and carries repo-specific behavior +- `agent-os`'s Codex story is present in the registry/docs, but the safest path is to validate the runtime on a less central adapter first + +Success criteria: + +- heartbeat can invoke the adapter reliably +- session resume works across heartbeats +- Paperclip still records logs, summaries, cost metadata, and issue comments normally +- runtime permissions can be configured without breaking common tasks + +### Recommendation B: adopt capability vocabulary into adapter configs + +Even without using `agent-os`, Paperclip should consider standardizing adapter/runtime permissions around a vocabulary like: + +- filesystem +- network +- subprocess/tool execution +- environment access + +This would improve: + +- schema-driven adapter UIs +- future approvals +- observability +- policy portability across adapters + +### Recommendation C: explore snapshot-backed execution workspaces + +Paperclip should evaluate whether some execution workspaces can be backed by: + +- a reusable lower snapshot +- a disposable upper layer +- optional mounts for project data or artifacts + +This is most valuable for: + +- non-repo tasks +- repeatable routines +- preview/test environments +- isolation-heavy local execution + +It is less urgent for full repo editing flows that already benefit from git worktrees. + +### Recommendation D: strengthen typed tool surfaces + +Paperclip plugins and adapters should continue moving toward explicit typed tools over ad hoc shell access. `agent-os` confirms that this is the right direction. + +This is a good fit for: + +- plugin tools +- workspace runtime services +- governed operations that need approval or auditability + +### Recommendation E: do not import runtime-level workflows into Paperclip core + +Paperclip should not copy `agent-os` cron/workflow/queue concepts into core orchestration yet. + +If we want them later, they must map cleanly onto: + +- issues +- comments +- heartbeats +- approvals +- budgets +- activity logs + +Without that mapping, they would create a second orchestration system inside the product. + +## A practical integration map + +### Best near-term fits + +- optional local adapter runtime +- runtime capability schema +- typed host-tool ideas for plugins/adapters +- snapshot ideas for disposable execution roots + +### Medium-term fits + +- stronger session capability normalization across adapters +- policy-aware runtime permission UI +- selective ACP-inspired event normalization + +### Poor fits right now + +- moving Paperclip orchestration into agent-os workflows +- replacing company/task/governance models with runtime constructs +- making Rust sidecars a mandatory dependency for all local execution + +## Bottom line + +`agent-os` is useful to Paperclip as an execution technology reference, not as a product model. + +Paperclip should treat it the same way it treats sandboxes or agent CLIs: + +- execution substrate underneath the control plane +- optional where the tradeoff is worth it +- never the source of truth for company/task/governance state + +If we do one thing from this report, it should be a narrowly scoped `agentos_local` experiment plus a design pass on capability-based runtime permissions. Those two ideas have the best upside and the lowest architectural risk.