forked from farhoodlabs/paperclip
docs: add agent-os follow-up plan
This commit is contained in:
@@ -0,0 +1,261 @@
|
||||
# PAP-1229 Agent OS Follow-up Plan
|
||||
|
||||
Date: 2026-04-08
|
||||
Related issue: `PAP-1229`
|
||||
Companion analysis: `doc/plans/2026-04-08-agent-os-technical-report.md`
|
||||
|
||||
## Goal
|
||||
|
||||
Turn the `agent-os` research into a low-risk Paperclip execution plan that preserves Paperclip's control-plane model while testing the few runtime ideas that appear worth adopting.
|
||||
|
||||
## Decision summary
|
||||
|
||||
Paperclip should not absorb `agent-os` as a product model or orchestration layer.
|
||||
|
||||
Paperclip should evaluate `agent-os` in three narrow areas:
|
||||
|
||||
1. optional agent runtime for selected local adapters
|
||||
2. capability-based runtime permission vocabulary
|
||||
3. snapshot-backed disposable execution roots
|
||||
|
||||
Everything else should stay out of scope unless those three experiments produce strong evidence.
|
||||
|
||||
## Success condition
|
||||
|
||||
This work is successful when Paperclip has:
|
||||
|
||||
- a clear yes/no answer on whether `agent-os` is worth supporting as an execution substrate
|
||||
- a concrete adapter/runtime experiment with measurable results
|
||||
- a proposed runtime capability model that fits current Paperclip adapters
|
||||
- a clear decision on whether snapshot-backed execution roots are worth integrating
|
||||
|
||||
## Non-goals
|
||||
|
||||
Do not:
|
||||
|
||||
- replace Paperclip heartbeats, issues, comments, approvals, or budgets with `agent-os` primitives
|
||||
- introduce Rust/sidecar requirements for all local execution paths
|
||||
- migrate all adapters at once
|
||||
- add runtime workflow/queue abstractions to Paperclip core
|
||||
|
||||
## Existing Paperclip integration points
|
||||
|
||||
The plan should stay anchored to these existing surfaces:
|
||||
|
||||
- `packages/adapter-utils/src/types.ts`
|
||||
- adapter contract, runtime service reporting, session metadata, and capability normalization targets
|
||||
- `server/src/services/heartbeat.ts`
|
||||
- execution entry point, log capture, issue comment summaries, and cost reporting
|
||||
- `server/src/services/execution-workspaces.ts`
|
||||
- current workspace lifecycle and git-oriented cleanup/readiness model
|
||||
- `server/src/services/plugin-loader.ts`
|
||||
- typed host capability boundary and extension loading patterns
|
||||
- local adapter implementations in `packages/adapters/*/src/server/`
|
||||
- current execution behavior to compare against an `agent-os`-backed path
|
||||
|
||||
## Phase plan
|
||||
|
||||
### Phase 0: constraints and experiment design
|
||||
|
||||
Objective:
|
||||
|
||||
- make the evaluation falsifiable before writing integration code
|
||||
|
||||
Deliverables:
|
||||
|
||||
- short experiment brief added to this document or a child issue
|
||||
- chosen first runtime target: `pi_local` or `opencode_local`
|
||||
- baseline metrics definition
|
||||
|
||||
Questions to lock down:
|
||||
|
||||
- what exact developer experience should improve
|
||||
- what security/isolation property we expect to gain
|
||||
- what failure modes are unacceptable
|
||||
- whether the prototype is adapter-only or a deeper internal runtime abstraction spike
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- a single first target chosen
|
||||
- measurable comparison criteria agreed on
|
||||
|
||||
Recommended metrics:
|
||||
|
||||
- cold start latency
|
||||
- session resume reliability across heartbeats
|
||||
- transcript/log quality
|
||||
- implementation complexity
|
||||
- operational complexity on local dev machines
|
||||
|
||||
### Phase 1: `agentos_local` spike
|
||||
|
||||
Objective:
|
||||
|
||||
- prove that Paperclip can drive one local agent through an `agent-os` runtime without breaking heartbeat semantics
|
||||
|
||||
Suggested scope:
|
||||
|
||||
- implement a new experimental adapter, `agentos_local`, or a feature-flagged runtime path under one existing adapter
|
||||
- start with `pi_local` or `opencode_local`
|
||||
- keep Paperclip's existing heartbeat, issue, workspace, and comment flow authoritative
|
||||
|
||||
Minimum implementation shape:
|
||||
|
||||
- adapter accepts model/runtime config
|
||||
- `server/src/services/heartbeat.ts` still owns run lifecycle
|
||||
- execution result still maps into existing `AdapterExecutionResult`
|
||||
- session state still fits current `sessionParams` / `sessionDisplayId` flow
|
||||
|
||||
What to verify:
|
||||
|
||||
- checkout and heartbeat flow still work end to end
|
||||
- resume across multiple heartbeats works
|
||||
- logs/transcripts remain readable in the UI
|
||||
- failure paths surface cleanly in issue comments and run logs
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- one agent type can run reliably through the new path
|
||||
- documented comparison against the existing local adapter path
|
||||
- explicit recommendation: continue, pause, or abandon
|
||||
|
||||
### Phase 2: capability-based runtime permissions
|
||||
|
||||
Objective:
|
||||
|
||||
- introduce a Paperclip-native capability vocabulary without coupling the product to `agent-os`
|
||||
|
||||
Suggested scope:
|
||||
|
||||
- extend adapter config schema vocabulary for runtime permissions
|
||||
- prototype normalized capabilities such as:
|
||||
- `fs.read`
|
||||
- `fs.write`
|
||||
- `network.fetch`
|
||||
- `network.listen`
|
||||
- `process.spawn`
|
||||
- `env.read`
|
||||
|
||||
Integration targets:
|
||||
|
||||
- `packages/adapter-utils/src/types.ts`
|
||||
- adapter config-schema support
|
||||
- server-side runtime config validation
|
||||
- future board-facing UI for permissions, if needed
|
||||
|
||||
What to avoid:
|
||||
|
||||
- building a full human policy UI before the vocabulary is proven useful
|
||||
- forcing every adapter to implement capability enforcement immediately
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- documented capability schema
|
||||
- one adapter path using it meaningfully
|
||||
- clear compatibility story for non-`agent-os` adapters
|
||||
|
||||
### Phase 3: snapshot-backed execution root experiment
|
||||
|
||||
Objective:
|
||||
|
||||
- determine whether a layered/snapshotted root model improves some Paperclip workloads
|
||||
|
||||
Suggested scope:
|
||||
|
||||
- evaluate it only for disposable or non-repo-heavy tasks first
|
||||
- keep git worktree-based repo editing as the default for codebase tasks
|
||||
|
||||
Promising use cases:
|
||||
|
||||
- routine-style runs
|
||||
- ephemeral preview/test environments
|
||||
- isolated document/artifact generation
|
||||
- tasks that do not need full git history or branch semantics
|
||||
|
||||
Integration targets:
|
||||
|
||||
- `server/src/services/execution-workspaces.ts`
|
||||
- workspace realization paths called from `server/src/services/heartbeat.ts`
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- clear statement on which workload classes benefit
|
||||
- clear statement on which workloads should stay on worktrees
|
||||
- go/no-go decision for broader implementation
|
||||
|
||||
### Phase 4: typed host tool evaluation
|
||||
|
||||
Objective:
|
||||
|
||||
- identify where Paperclip should prefer explicit typed tools over ambient shell access
|
||||
|
||||
Suggested scope:
|
||||
|
||||
- compare `agent-os` host-toolkit ideas with existing plugin and runtime-service surfaces
|
||||
- choose 1-2 sensitive operations that should become typed tools
|
||||
|
||||
Good candidates:
|
||||
|
||||
- git metadata/status inspection
|
||||
- runtime service inspection
|
||||
- deployment/preview status retrieval
|
||||
- generated artifact publishing
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- one concrete proposal for typed-tool adoption in Paperclip
|
||||
- clear statement on whether this belongs in plugins, adapters, or core services
|
||||
|
||||
## Recommended sequencing
|
||||
|
||||
Recommended order:
|
||||
|
||||
1. Phase 0
|
||||
2. Phase 1
|
||||
3. Phase 2
|
||||
4. Phase 3
|
||||
5. Phase 4
|
||||
|
||||
Reasoning:
|
||||
|
||||
- Phase 1 is the fastest way to invalidate or validate the entire `agent-os` direction
|
||||
- Phase 2 is valuable even if Phase 1 is abandoned
|
||||
- Phase 3 should wait until there is confidence that the runtime approach is operationally worthwhile
|
||||
- Phase 4 is useful independently but should be informed by what Phase 1 and Phase 2 expose
|
||||
|
||||
## Risks
|
||||
|
||||
### Technical risk
|
||||
|
||||
- `agent-os` introduces Rust sidecar and packaging complexity that may outweigh runtime benefits
|
||||
|
||||
### Product risk
|
||||
|
||||
- runtime experimentation could blur the boundary between Paperclip as control plane and Paperclip as execution platform
|
||||
|
||||
### Integration risk
|
||||
|
||||
- session semantics, log formatting, and failure behavior may degrade relative to current local adapters
|
||||
|
||||
### Scope risk
|
||||
|
||||
- a small runtime spike could expand into an adapter-system rewrite if not kept tightly bounded
|
||||
|
||||
## Guardrails
|
||||
|
||||
To keep this effort controlled:
|
||||
|
||||
- keep all experiments behind a clearly experimental adapter or feature flag
|
||||
- do not change issue/comment/approval/budget semantics to suit the runtime
|
||||
- measure against current local adapters instead of judging in isolation
|
||||
- stop after Phase 1 if the operational burden is already clearly too high
|
||||
|
||||
## Proposed next action
|
||||
|
||||
The next concrete action should be a small implementation spike issue:
|
||||
|
||||
- title: `Prototype experimental agentos_local runtime for one local adapter`
|
||||
- target adapter: `opencode_local` unless `pi_local` is materially easier
|
||||
- expected output: code spike, short verification notes, and a continue/stop recommendation
|
||||
|
||||
If leadership wants planning only and no spike yet, this document is the handoff artifact for that decision.
|
||||
Reference in New Issue
Block a user