From 5758aba91e4c1b9b995400d7b7af0578d34f0237 Mon Sep 17 00:00:00 2001
From: dotta <dotta@example.com>
Date: Wed, 8 Apr 2026 17:43:58 -0500
Subject: [PATCH] docs: add agent-os follow-up plan

---
 .../2026-04-08-agent-os-follow-up-plan.md     | 261 ++++++++++++++++++
 1 file changed, 261 insertions(+)
 create mode 100644 doc/plans/2026-04-08-agent-os-follow-up-plan.md

diff --git a/doc/plans/2026-04-08-agent-os-follow-up-plan.md b/doc/plans/2026-04-08-agent-os-follow-up-plan.md
new file mode 100644
index 00000000..52029943
--- /dev/null
+++ b/doc/plans/2026-04-08-agent-os-follow-up-plan.md
@@ -0,0 +1,261 @@
+# PAP-1229 Agent OS Follow-up Plan
+
+Date: 2026-04-08
+Related issue: `PAP-1229`
+Companion analysis: `doc/plans/2026-04-08-agent-os-technical-report.md`
+
+## Goal
+
+Turn the `agent-os` research into a low-risk Paperclip execution plan that preserves Paperclip's control-plane model while testing the few runtime ideas that appear worth adopting.
+
+## Decision summary
+
+Paperclip should not absorb `agent-os` as a product model or orchestration layer.
+
+Paperclip should evaluate `agent-os` in three narrow areas:
+
+1. optional agent runtime for selected local adapters
+2. capability-based runtime permission vocabulary
+3. snapshot-backed disposable execution roots
+
+Everything else should stay out of scope unless those three experiments produce strong evidence.
+
+## Success condition
+
+This work is successful when Paperclip has:
+
+- a clear yes/no answer on whether `agent-os` is worth supporting as an execution substrate
+- a concrete adapter/runtime experiment with measurable results
+- a proposed runtime capability model that fits current Paperclip adapters
+- a clear decision on whether snapshot-backed execution roots are worth integrating
+
+## Non-goals
+
+Do not:
+
+- replace Paperclip heartbeats, issues, comments, approvals, or budgets with `agent-os` primitives
+- introduce Rust/sidecar requirements for all local execution paths
+- migrate all adapters at once
+- add runtime workflow/queue abstractions to Paperclip core
+
+## Existing Paperclip integration points
+
+The plan should stay anchored to these existing surfaces:
+
+- `packages/adapter-utils/src/types.ts`
+  - adapter contract, runtime service reporting, session metadata, and capability normalization targets
+- `server/src/services/heartbeat.ts`
+  - execution entry point, log capture, issue comment summaries, and cost reporting
+- `server/src/services/execution-workspaces.ts`
+  - current workspace lifecycle and git-oriented cleanup/readiness model
+- `server/src/services/plugin-loader.ts`
+  - typed host capability boundary and extension loading patterns
+- local adapter implementations in `packages/adapters/*/src/server/`
+  - current execution behavior to compare against an `agent-os`-backed path
+
+## Phase plan
+
+### Phase 0: constraints and experiment design
+
+Objective:
+
+- make the evaluation falsifiable before writing integration code
+
+Deliverables:
+
+- short experiment brief added to this document or a child issue
+- chosen first runtime target: `pi_local` or `opencode_local`
+- baseline metrics definition
+
+Questions to lock down:
+
+- what exact developer experience should improve
+- what security/isolation property we expect to gain
+- what failure modes are unacceptable
+- whether the prototype is adapter-only or a deeper internal runtime abstraction spike
+
+Exit criteria:
+
+- a single first target chosen
+- measurable comparison criteria agreed on
+
+Recommended metrics:
+
+- cold start latency
+- session resume reliability across heartbeats
+- transcript/log quality
+- implementation complexity
+- operational complexity on local dev machines
+
+### Phase 1: `agentos_local` spike
+
+Objective:
+
+- prove that Paperclip can drive one local agent through an `agent-os` runtime without breaking heartbeat semantics
+
+Suggested scope:
+
+- implement a new experimental adapter, `agentos_local`, or a feature-flagged runtime path under one existing adapter
+- start with `pi_local` or `opencode_local`
+- keep Paperclip's existing heartbeat, issue, workspace, and comment flow authoritative
+
+Minimum implementation shape:
+
+- adapter accepts model/runtime config
+- `server/src/services/heartbeat.ts` still owns run lifecycle
+- execution result still maps into existing `AdapterExecutionResult`
+- session state still fits current `sessionParams` / `sessionDisplayId` flow
+
+What to verify:
+
+- checkout and heartbeat flow still work end to end
+- resume across multiple heartbeats works
+- logs/transcripts remain readable in the UI
+- failure paths surface cleanly in issue comments and run logs
+
+Exit criteria:
+
+- one agent type can run reliably through the new path
+- documented comparison against the existing local adapter path
+- explicit recommendation: continue, pause, or abandon
+
+### Phase 2: capability-based runtime permissions
+
+Objective:
+
+- introduce a Paperclip-native capability vocabulary without coupling the product to `agent-os`
+
+Suggested scope:
+
+- extend adapter config schema vocabulary for runtime permissions
+- prototype normalized capabilities such as:
+  - `fs.read`
+  - `fs.write`
+  - `network.fetch`
+  - `network.listen`
+  - `process.spawn`
+  - `env.read`
+
+Integration targets:
+
+- `packages/adapter-utils/src/types.ts`
+- adapter config-schema support
+- server-side runtime config validation
+- future board-facing UI for permissions, if needed
+
+What to avoid:
+
+- building a full human policy UI before the vocabulary is proven useful
+- forcing every adapter to implement capability enforcement immediately
+
+Exit criteria:
+
+- documented capability schema
+- one adapter path using it meaningfully
+- clear compatibility story for non-`agent-os` adapters
+
+### Phase 3: snapshot-backed execution root experiment
+
+Objective:
+
+- determine whether a layered/snapshotted root model improves some Paperclip workloads
+
+Suggested scope:
+
+- evaluate it only for disposable or non-repo-heavy tasks first
+- keep git worktree-based repo editing as the default for codebase tasks
+
+Promising use cases:
+
+- routine-style runs
+- ephemeral preview/test environments
+- isolated document/artifact generation
+- tasks that do not need full git history or branch semantics
+
+Integration targets:
+
+- `server/src/services/execution-workspaces.ts`
+- workspace realization paths called from `server/src/services/heartbeat.ts`
+
+Exit criteria:
+
+- clear statement on which workload classes benefit
+- clear statement on which workloads should stay on worktrees
+- go/no-go decision for broader implementation
+
+### Phase 4: typed host tool evaluation
+
+Objective:
+
+- identify where Paperclip should prefer explicit typed tools over ambient shell access
+
+Suggested scope:
+
+- compare `agent-os` host-toolkit ideas with existing plugin and runtime-service surfaces
+- choose 1-2 sensitive operations that should become typed tools
+
+Good candidates:
+
+- git metadata/status inspection
+- runtime service inspection
+- deployment/preview status retrieval
+- generated artifact publishing
+
+Exit criteria:
+
+- one concrete proposal for typed-tool adoption in Paperclip
+- clear statement on whether this belongs in plugins, adapters, or core services
+
+## Recommended sequencing
+
+Recommended order:
+
+1. Phase 0
+2. Phase 1
+3. Phase 2
+4. Phase 3
+5. Phase 4
+
+Reasoning:
+
+- Phase 1 is the fastest way to invalidate or validate the entire `agent-os` direction
+- Phase 2 is valuable even if Phase 1 is abandoned
+- Phase 3 should wait until there is confidence that the runtime approach is operationally worthwhile
+- Phase 4 is useful independently but should be informed by what Phase 1 and Phase 2 expose
+
+## Risks
+
+### Technical risk
+
+- `agent-os` introduces Rust sidecar and packaging complexity that may outweigh runtime benefits
+
+### Product risk
+
+- runtime experimentation could blur the boundary between Paperclip as control plane and Paperclip as execution platform
+
+### Integration risk
+
+- session semantics, log formatting, and failure behavior may degrade relative to current local adapters
+
+### Scope risk
+
+- a small runtime spike could expand into an adapter-system rewrite if not kept tightly bounded
+
+## Guardrails
+
+To keep this effort controlled:
+
+- keep all experiments behind a clearly experimental adapter or feature flag
+- do not change issue/comment/approval/budget semantics to suit the runtime
+- measure against current local adapters instead of judging in isolation
+- stop after Phase 1 if the operational burden is already clearly too high
+
+## Proposed next action
+
+The next concrete action should be a small implementation spike issue:
+
+- title: `Prototype experimental agentos_local runtime for one local adapter`
+- target adapter: `opencode_local` unless `pi_local` is materially easier
+- expected output: code spike, short verification notes, and a continue/stop recommendation
+
+If leadership wants planning only and no spike yet, this document is the handoff artifact for that decision.