docs: add agent-os follow-up plan

This commit is contained in:
dotta
2026-04-08 17:43:58 -05:00
parent 482dac7097
commit 5758aba91e
@@ -0,0 +1,261 @@
# PAP-1229 Agent OS Follow-up Plan
Date: 2026-04-08
Related issue: `PAP-1229`
Companion analysis: `doc/plans/2026-04-08-agent-os-technical-report.md`
## Goal
Turn the `agent-os` research into a low-risk Paperclip execution plan that preserves Paperclip's control-plane model while testing the few runtime ideas that appear worth adopting.
## Decision summary
Paperclip should not absorb `agent-os` as a product model or orchestration layer.
Paperclip should evaluate `agent-os` in three narrow areas:
1. optional agent runtime for selected local adapters
2. capability-based runtime permission vocabulary
3. snapshot-backed disposable execution roots
Everything else should stay out of scope unless those three experiments produce strong evidence.
## Success condition
This work is successful when Paperclip has:
- a clear yes/no answer on whether `agent-os` is worth supporting as an execution substrate
- a concrete adapter/runtime experiment with measurable results
- a proposed runtime capability model that fits current Paperclip adapters
- a clear decision on whether snapshot-backed execution roots are worth integrating
## Non-goals
Do not:
- replace Paperclip heartbeats, issues, comments, approvals, or budgets with `agent-os` primitives
- introduce Rust/sidecar requirements for all local execution paths
- migrate all adapters at once
- add runtime workflow/queue abstractions to Paperclip core
## Existing Paperclip integration points
The plan should stay anchored to these existing surfaces:
- `packages/adapter-utils/src/types.ts`
- adapter contract, runtime service reporting, session metadata, and capability normalization targets
- `server/src/services/heartbeat.ts`
- execution entry point, log capture, issue comment summaries, and cost reporting
- `server/src/services/execution-workspaces.ts`
- current workspace lifecycle and git-oriented cleanup/readiness model
- `server/src/services/plugin-loader.ts`
- typed host capability boundary and extension loading patterns
- local adapter implementations in `packages/adapters/*/src/server/`
- current execution behavior to compare against an `agent-os`-backed path
## Phase plan
### Phase 0: constraints and experiment design
Objective:
- make the evaluation falsifiable before writing integration code
Deliverables:
- short experiment brief added to this document or a child issue
- chosen first runtime target: `pi_local` or `opencode_local`
- baseline metrics definition
Questions to lock down:
- what exact developer experience should improve
- what security/isolation property we expect to gain
- what failure modes are unacceptable
- whether the prototype is adapter-only or a deeper internal runtime abstraction spike
Exit criteria:
- a single first target chosen
- measurable comparison criteria agreed on
Recommended metrics:
- cold start latency
- session resume reliability across heartbeats
- transcript/log quality
- implementation complexity
- operational complexity on local dev machines
### Phase 1: `agentos_local` spike
Objective:
- prove that Paperclip can drive one local agent through an `agent-os` runtime without breaking heartbeat semantics
Suggested scope:
- implement a new experimental adapter, `agentos_local`, or a feature-flagged runtime path under one existing adapter
- start with `pi_local` or `opencode_local`
- keep Paperclip's existing heartbeat, issue, workspace, and comment flow authoritative
Minimum implementation shape:
- adapter accepts model/runtime config
- `server/src/services/heartbeat.ts` still owns run lifecycle
- execution result still maps into existing `AdapterExecutionResult`
- session state still fits current `sessionParams` / `sessionDisplayId` flow
What to verify:
- checkout and heartbeat flow still work end to end
- resume across multiple heartbeats works
- logs/transcripts remain readable in the UI
- failure paths surface cleanly in issue comments and run logs
Exit criteria:
- one agent type can run reliably through the new path
- documented comparison against the existing local adapter path
- explicit recommendation: continue, pause, or abandon
### Phase 2: capability-based runtime permissions
Objective:
- introduce a Paperclip-native capability vocabulary without coupling the product to `agent-os`
Suggested scope:
- extend adapter config schema vocabulary for runtime permissions
- prototype normalized capabilities such as:
- `fs.read`
- `fs.write`
- `network.fetch`
- `network.listen`
- `process.spawn`
- `env.read`
Integration targets:
- `packages/adapter-utils/src/types.ts`
- adapter config-schema support
- server-side runtime config validation
- future board-facing UI for permissions, if needed
What to avoid:
- building a full human policy UI before the vocabulary is proven useful
- forcing every adapter to implement capability enforcement immediately
Exit criteria:
- documented capability schema
- one adapter path using it meaningfully
- clear compatibility story for non-`agent-os` adapters
### Phase 3: snapshot-backed execution root experiment
Objective:
- determine whether a layered/snapshotted root model improves some Paperclip workloads
Suggested scope:
- evaluate it only for disposable or non-repo-heavy tasks first
- keep git worktree-based repo editing as the default for codebase tasks
Promising use cases:
- routine-style runs
- ephemeral preview/test environments
- isolated document/artifact generation
- tasks that do not need full git history or branch semantics
Integration targets:
- `server/src/services/execution-workspaces.ts`
- workspace realization paths called from `server/src/services/heartbeat.ts`
Exit criteria:
- clear statement on which workload classes benefit
- clear statement on which workloads should stay on worktrees
- go/no-go decision for broader implementation
### Phase 4: typed host tool evaluation
Objective:
- identify where Paperclip should prefer explicit typed tools over ambient shell access
Suggested scope:
- compare `agent-os` host-toolkit ideas with existing plugin and runtime-service surfaces
- choose 1-2 sensitive operations that should become typed tools
Good candidates:
- git metadata/status inspection
- runtime service inspection
- deployment/preview status retrieval
- generated artifact publishing
Exit criteria:
- one concrete proposal for typed-tool adoption in Paperclip
- clear statement on whether this belongs in plugins, adapters, or core services
## Recommended sequencing
Recommended order:
1. Phase 0
2. Phase 1
3. Phase 2
4. Phase 3
5. Phase 4
Reasoning:
- Phase 1 is the fastest way to invalidate or validate the entire `agent-os` direction
- Phase 2 is valuable even if Phase 1 is abandoned
- Phase 3 should wait until there is confidence that the runtime approach is operationally worthwhile
- Phase 4 is useful independently but should be informed by what Phase 1 and Phase 2 expose
## Risks
### Technical risk
- `agent-os` introduces Rust sidecar and packaging complexity that may outweigh runtime benefits
### Product risk
- runtime experimentation could blur the boundary between Paperclip as control plane and Paperclip as execution platform
### Integration risk
- session semantics, log formatting, and failure behavior may degrade relative to current local adapters
### Scope risk
- a small runtime spike could expand into an adapter-system rewrite if not kept tightly bounded
## Guardrails
To keep this effort controlled:
- keep all experiments behind a clearly experimental adapter or feature flag
- do not change issue/comment/approval/budget semantics to suit the runtime
- measure against current local adapters instead of judging in isolation
- stop after Phase 1 if the operational burden is already clearly too high
## Proposed next action
The next concrete action should be a small implementation spike issue:
- title: `Prototype experimental agentos_local runtime for one local adapter`
- target adapter: `opencode_local` unless `pi_local` is materially easier
- expected output: code spike, short verification notes, and a continue/stop recommendation
If leadership wants planning only and no spike yet, this document is the handoff artifact for that decision.