From 5758aba91e4c1b9b995400d7b7af0578d34f0237 Mon Sep 17 00:00:00 2001 From: dotta Date: Wed, 8 Apr 2026 17:43:58 -0500 Subject: [PATCH] docs: add agent-os follow-up plan --- .../2026-04-08-agent-os-follow-up-plan.md | 261 ++++++++++++++++++ 1 file changed, 261 insertions(+) create mode 100644 doc/plans/2026-04-08-agent-os-follow-up-plan.md diff --git a/doc/plans/2026-04-08-agent-os-follow-up-plan.md b/doc/plans/2026-04-08-agent-os-follow-up-plan.md new file mode 100644 index 00000000..52029943 --- /dev/null +++ b/doc/plans/2026-04-08-agent-os-follow-up-plan.md @@ -0,0 +1,261 @@ +# PAP-1229 Agent OS Follow-up Plan + +Date: 2026-04-08 +Related issue: `PAP-1229` +Companion analysis: `doc/plans/2026-04-08-agent-os-technical-report.md` + +## Goal + +Turn the `agent-os` research into a low-risk Paperclip execution plan that preserves Paperclip's control-plane model while testing the few runtime ideas that appear worth adopting. + +## Decision summary + +Paperclip should not absorb `agent-os` as a product model or orchestration layer. + +Paperclip should evaluate `agent-os` in three narrow areas: + +1. optional agent runtime for selected local adapters +2. capability-based runtime permission vocabulary +3. snapshot-backed disposable execution roots + +Everything else should stay out of scope unless those three experiments produce strong evidence. + +## Success condition + +This work is successful when Paperclip has: + +- a clear yes/no answer on whether `agent-os` is worth supporting as an execution substrate +- a concrete adapter/runtime experiment with measurable results +- a proposed runtime capability model that fits current Paperclip adapters +- a clear decision on whether snapshot-backed execution roots are worth integrating + +## Non-goals + +Do not: + +- replace Paperclip heartbeats, issues, comments, approvals, or budgets with `agent-os` primitives +- introduce Rust/sidecar requirements for all local execution paths +- migrate all adapters at once +- add runtime workflow/queue abstractions to Paperclip core + +## Existing Paperclip integration points + +The plan should stay anchored to these existing surfaces: + +- `packages/adapter-utils/src/types.ts` + - adapter contract, runtime service reporting, session metadata, and capability normalization targets +- `server/src/services/heartbeat.ts` + - execution entry point, log capture, issue comment summaries, and cost reporting +- `server/src/services/execution-workspaces.ts` + - current workspace lifecycle and git-oriented cleanup/readiness model +- `server/src/services/plugin-loader.ts` + - typed host capability boundary and extension loading patterns +- local adapter implementations in `packages/adapters/*/src/server/` + - current execution behavior to compare against an `agent-os`-backed path + +## Phase plan + +### Phase 0: constraints and experiment design + +Objective: + +- make the evaluation falsifiable before writing integration code + +Deliverables: + +- short experiment brief added to this document or a child issue +- chosen first runtime target: `pi_local` or `opencode_local` +- baseline metrics definition + +Questions to lock down: + +- what exact developer experience should improve +- what security/isolation property we expect to gain +- what failure modes are unacceptable +- whether the prototype is adapter-only or a deeper internal runtime abstraction spike + +Exit criteria: + +- a single first target chosen +- measurable comparison criteria agreed on + +Recommended metrics: + +- cold start latency +- session resume reliability across heartbeats +- transcript/log quality +- implementation complexity +- operational complexity on local dev machines + +### Phase 1: `agentos_local` spike + +Objective: + +- prove that Paperclip can drive one local agent through an `agent-os` runtime without breaking heartbeat semantics + +Suggested scope: + +- implement a new experimental adapter, `agentos_local`, or a feature-flagged runtime path under one existing adapter +- start with `pi_local` or `opencode_local` +- keep Paperclip's existing heartbeat, issue, workspace, and comment flow authoritative + +Minimum implementation shape: + +- adapter accepts model/runtime config +- `server/src/services/heartbeat.ts` still owns run lifecycle +- execution result still maps into existing `AdapterExecutionResult` +- session state still fits current `sessionParams` / `sessionDisplayId` flow + +What to verify: + +- checkout and heartbeat flow still work end to end +- resume across multiple heartbeats works +- logs/transcripts remain readable in the UI +- failure paths surface cleanly in issue comments and run logs + +Exit criteria: + +- one agent type can run reliably through the new path +- documented comparison against the existing local adapter path +- explicit recommendation: continue, pause, or abandon + +### Phase 2: capability-based runtime permissions + +Objective: + +- introduce a Paperclip-native capability vocabulary without coupling the product to `agent-os` + +Suggested scope: + +- extend adapter config schema vocabulary for runtime permissions +- prototype normalized capabilities such as: + - `fs.read` + - `fs.write` + - `network.fetch` + - `network.listen` + - `process.spawn` + - `env.read` + +Integration targets: + +- `packages/adapter-utils/src/types.ts` +- adapter config-schema support +- server-side runtime config validation +- future board-facing UI for permissions, if needed + +What to avoid: + +- building a full human policy UI before the vocabulary is proven useful +- forcing every adapter to implement capability enforcement immediately + +Exit criteria: + +- documented capability schema +- one adapter path using it meaningfully +- clear compatibility story for non-`agent-os` adapters + +### Phase 3: snapshot-backed execution root experiment + +Objective: + +- determine whether a layered/snapshotted root model improves some Paperclip workloads + +Suggested scope: + +- evaluate it only for disposable or non-repo-heavy tasks first +- keep git worktree-based repo editing as the default for codebase tasks + +Promising use cases: + +- routine-style runs +- ephemeral preview/test environments +- isolated document/artifact generation +- tasks that do not need full git history or branch semantics + +Integration targets: + +- `server/src/services/execution-workspaces.ts` +- workspace realization paths called from `server/src/services/heartbeat.ts` + +Exit criteria: + +- clear statement on which workload classes benefit +- clear statement on which workloads should stay on worktrees +- go/no-go decision for broader implementation + +### Phase 4: typed host tool evaluation + +Objective: + +- identify where Paperclip should prefer explicit typed tools over ambient shell access + +Suggested scope: + +- compare `agent-os` host-toolkit ideas with existing plugin and runtime-service surfaces +- choose 1-2 sensitive operations that should become typed tools + +Good candidates: + +- git metadata/status inspection +- runtime service inspection +- deployment/preview status retrieval +- generated artifact publishing + +Exit criteria: + +- one concrete proposal for typed-tool adoption in Paperclip +- clear statement on whether this belongs in plugins, adapters, or core services + +## Recommended sequencing + +Recommended order: + +1. Phase 0 +2. Phase 1 +3. Phase 2 +4. Phase 3 +5. Phase 4 + +Reasoning: + +- Phase 1 is the fastest way to invalidate or validate the entire `agent-os` direction +- Phase 2 is valuable even if Phase 1 is abandoned +- Phase 3 should wait until there is confidence that the runtime approach is operationally worthwhile +- Phase 4 is useful independently but should be informed by what Phase 1 and Phase 2 expose + +## Risks + +### Technical risk + +- `agent-os` introduces Rust sidecar and packaging complexity that may outweigh runtime benefits + +### Product risk + +- runtime experimentation could blur the boundary between Paperclip as control plane and Paperclip as execution platform + +### Integration risk + +- session semantics, log formatting, and failure behavior may degrade relative to current local adapters + +### Scope risk + +- a small runtime spike could expand into an adapter-system rewrite if not kept tightly bounded + +## Guardrails + +To keep this effort controlled: + +- keep all experiments behind a clearly experimental adapter or feature flag +- do not change issue/comment/approval/budget semantics to suit the runtime +- measure against current local adapters instead of judging in isolation +- stop after Phase 1 if the operational burden is already clearly too high + +## Proposed next action + +The next concrete action should be a small implementation spike issue: + +- title: `Prototype experimental agentos_local runtime for one local adapter` +- target adapter: `opencode_local` unless `pi_local` is materially easier +- expected output: code spike, short verification notes, and a continue/stop recommendation + +If leadership wants planning only and no spike yet, this document is the handoff artifact for that decision.