Files
Dotta 38c185fb8b [codex] Add agent permissions and controls plan (#6386)
## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies by keeping
task ownership, approvals, and operator control inside one control
plane.
> - Agent permissions and plugin-hosted company settings sit on the
boundary between autonomy and governance.
> - V1 needs scoped task assignment rules, plugin extension points, and
clearer company access surfaces without weakening company boundaries.
> - The branch builds the core authorization service, plugin SDK/host
APIs, and UI simplifications needed to support those controls.
> - Paperclip EE plugin surfaces were intentionally moved out of this
core PR per review direction, so this PR now carries only the public
core/plugin infrastructure work.
> - The latest updates preserve the PAP-9937 branch changes that belong
in this PR, remove the `design/` artifacts, and exclude the experimental
`plugin-briefs` package.
> - Greptile feedback was applied through the authorization/audit paths
and the final cleanup commit was re-reviewed at 5/5 with no unresolved
Greptile threads.
> - The benefit is safer assignment control with extension hooks for
richer permission products while preserving simple defaults for normal
operators.

## What Changed

- Added scoped task-assignment authorization decisions and routed
issue/agent assignment mutations through the authorization service.
- Added plugin SDK and host APIs for company settings slots,
authorization policy/grant management, assignment previews, and bridge
invocation scope propagation.
- Simplified core company access UI and moved advanced controls behind
plugin-provided settings surfaces.
- Added retry-now affordances for blocked issue next-step notices.
- Added protected-assignment enforcement for persisted
agent/project/issue policies, including explicit-grant fallback
behavior.
- Added incremental principal-access compatibility backfill for active
agent memberships and role-default human permission grants.
- Added the Markdown code block wrap action fix from the latest branch
changes.
- Removed `design/` artifacts from the PR and removed
`packages/plugins/plugin-briefs` from the final diff.
- Addressed Greptile feedback for plugin actor sanitization, legacy
membership handling, audit pagination, unknown grant-scope metadata, and
startup test mocks.

## Verification

- `pnpm exec vitest run server/src/__tests__/access-service.test.ts
server/src/__tests__/company-portability.test.ts` -> 2 files passed, 54
tests passed.
- `pnpm exec vitest run
server/src/__tests__/server-startup-feedback-export.test.ts
server/src/__tests__/access-service.test.ts
server/src/__tests__/company-portability.test.ts` -> 3 files passed, 62
tests passed.
- `pnpm exec vitest run
server/src/__tests__/authorization-service.test.ts
server/src/__tests__/plugin-access-authorization-host-services.test.ts
server/src/__tests__/server-startup-feedback-export.test.ts` -> 3 files
passed, 28 tests passed.
- `pnpm --filter @paperclipai/server typecheck` -> passed.
- `git diff --check` -> passed.
- `node ./scripts/check-docker-deps-stage.mjs` -> passed.
- `CI=true pnpm install --frozen-lockfile --ignore-scripts` -> passed
with no lockfile update.
- `pnpm exec vitest run
ui/src/components/MarkdownBody.interaction.test.tsx` -> 1 test passed.
- `git ls-files design packages/plugins/plugin-briefs | wc -l` -> 0.
- GitHub CI on `40cd83b53` -> all checks passed, merge state `CLEAN`.
- Greptile on `40cd83b53` -> 5/5, 102 files reviewed, 0
comments/annotations added, 0 unresolved review threads.
- Confirmed the PR diff contains no `design/`,
`packages/plugins/plugin-briefs`, `pnpm-lock.yaml`, or
`.github/workflows` changes.

## Risks

- Medium: task assignment authorization paths are behaviorally stricter
for protected/private policy data, so existing plugin-authored policies
may block assignment until explicit grants or approval flows are
configured.
- Medium: plugin-host authorization APIs expand the surface area
available to trusted plugins and need careful review for company
scoping.
- Low: startup now performs a principal-access compatibility backfill,
but the migration and runtime backfill use conflict-tolerant inserts.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool-enabled workflow with shell,
git, and GitHub CLI access.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
2026-05-22 08:12:52 -05:00

9.6 KiB
Raw Permalink Blame History

Paperclip — Product Definition

What It Is

Paperclip is the control plane for autonomous AI companies. One instance of Paperclip can run multiple companies. A company is a first-order object.

Core Concepts

Company

A company has:

  • A goal — the reason it exists ("Create the #1 AI note-taking app that does $1M MRR within 3 months")
  • Employees — every employee is an AI agent
  • Org structure — who reports to whom
  • Revenue & expenses — tracked at the company level
  • Task hierarchy — all work traces back to the company goal

Employees & Agents

Every employee is an agent. When you create a company, you start by defining the CEO, then build out from there.

Each employee has:

  • Adapter type + config — how this agent runs and what defines its identity/behavior. This is adapter-specific (e.g., an OpenClaw agent might use SOUL.md and HEARTBEAT.md files; a Claude Code agent might use CLAUDE.md; a bare script might use CLI args). Paperclip doesn't prescribe the format — the adapter does.
  • Role & reporting — their title, who they report to, who reports to them
  • Capabilities description — a short paragraph on what this agent does and when they're relevant (helps other agents discover who can help with what)

Example: A CEO agent's adapter config tells it to "review what your executives are doing, check company metrics, reprioritize if needed, assign new strategic initiatives" on each heartbeat. An engineer's config tells it to "check assigned tasks, pick the highest priority, and work it."

Then you define who reports to the CEO: a CTO managing programmers, a CMO managing the marketing team, and so on. Every agent in the tree gets their own adapter configuration.

Agent Execution

Paperclip supports several ways to run an agent's heartbeat:

  1. Local CLI/session adapters — Paperclip starts or resumes local coding-tool sessions such as Claude Code, Codex, Gemini, OpenCode, Pi, and Cursor, then tracks the run.
  2. Run a command — Paperclip kicks off a process (shell command, Python script, etc.) and tracks it. The heartbeat is "execute this and monitor it."
  3. Fire and forget a request — Paperclip sends a webhook/API call to an externally running agent. The heartbeat is "notify this agent to wake up." OpenClaw-style hooks work this way.
  4. External adapter plugins — Paperclip loads adapter packages through the plugin/adapter flow so self-hosted installs can add runtimes without hardcoding them in core.

Agent runs can use project and execution workspaces, managed runtime services such as preview/dev servers, adapter-specific session state, and HTTP/webhook-style execution. We provide sensible defaults, but the adapter is still the boundary: if a runtime can be invoked, observed, and authorized, Paperclip can coordinate it.

Task Management

Task management is hierarchical. At any moment, every piece of work must trace back to the company's top-level goal through a chain of parent tasks:

I am researching the Facebook ads Granola uses (current task)
  because → I need to create Facebook ads for our software (parent)
    because → I need to grow new signups by 100 users (parent)
      because → I need to get revenue to $2,000 this week (parent)
        because → ...
          because → We're building the #1 AI note-taking app to $1M MRR in 3 months

Tasks have parentage. Every task exists in service of a parent task, all the way up to the company goal. This is what keeps autonomous agents aligned — they can always answer "why am I doing this?"

The current issue model includes stable issue identifiers, parent/sub-issues, blockers, a single assignee, comments, issue documents, attachments and work products, and review/approval handoffs. That structure keeps work inspectable by both the board and agents while still allowing agents to decompose work into smaller tasks.

Principles

  1. Unopinionated about how you run your agents. Your agents could be OpenClaw bots, Python scripts, Node scripts, Claude Code sessions, Codex instances — we don't care. Paperclip defines the control plane for communication and provides utility infrastructure for heartbeats. It does not mandate an agent runtime.

  2. Company is the unit of organization. Everything lives under a company. One Paperclip instance, many companies.

  3. Adapter config defines the agent. Every agent has an adapter type and configuration that controls its identity and behavior. The minimum contract is just "be callable."

  4. All work traces to the goal. Hierarchical task management means nothing exists in isolation. If you can't explain why a task matters to the company goal, it shouldn't exist.

  5. Control plane, not execution plane. Paperclip orchestrates. Agents run wherever they run and phone home.

User Flow (Dream Scenario)

  1. Open Paperclip, create a new company
  2. Define the company's goal: "Create the #1 AI note-taking app, $1M MRR in 3 months"
  3. Create the CEO
    • Choose an adapter (e.g., process adapter for Claude Code, HTTP adapter for OpenClaw)
    • Configure the adapter (agent identity, loop behavior, execution settings)
    • CEO proposes strategic breakdown → board approves
  4. Define the CEO's reports: CTO, CMO, CFO, etc.
    • Each gets their own adapter config and role definition
  5. Define their reports: engineers under CTO, marketers under CMO, etc.
  6. Set budgets, define initial strategic tasks
  7. Hit go — agents start their heartbeats and the company runs

Guidelines

There are two runtime modes Paperclip must support:

  • local_trusted (default): single-user local trusted deployment with no login friction
  • authenticated: login-required mode that supports both private-network and public deployment exposure policies

Canonical mode design and command expectations live in doc/DEPLOYMENT-MODES.md.

Further Detail

See SPEC.md for the full technical specification and TASKS.md for the task management data model.


Paperclips core identity is a control plane for autonomous AI companies, centered on companies, org charts, goals, issues/comments, heartbeats, budgets, approvals, and board governance. The public docs are also explicit about the current boundaries: tasks/comments are the built-in communication model, Paperclip is not a chatbot, and it is not a code review tool. The roadmap already points toward easier onboarding, cloud agents, easier agent configuration, plugins, better docs, and ClipMart/ClipHub-style reusable companies/templates.

What Paperclip should do vs. not do

Do

  • Stay board-level and company-level. Users should manage goals, orgs, budgets, approvals, and outputs.
  • Make the first five minutes feel magical: install, answer a few questions, see a CEO do something real.
  • Keep work anchored to issues/comments/projects/goals, even if the surface feels conversational.
  • Treat agency / internal team / startup as the same underlying abstraction with different templates and labels.
  • Make outputs first-class: files, docs, reports, previews, links, screenshots.
  • Provide hooks into engineering workflows: worktrees, preview servers, PR links, external review tools.
  • Use plugins for edge cases like rich chat, knowledge bases, doc editors, custom tracing.

Do not

  • Do not make the core product a general chat app. The current product definition is explicitly task/comment-centric and “not a chatbot,” and that boundary is valuable.
  • Do not build a complete Jira/GitHub replacement. The repo/docs already position Paperclip as organization orchestration, not focused on pull-request review.
  • Do not build enterprise-grade RBAC first. Paperclip now has authenticated mode, company memberships, instance roles, and permission grants, but fine-grained enterprise governance should remain secondary to the core company control plane.
  • Do not interpret agent-level privacy flags as a project/issue privacy feature in V1; work visibility stays company-scoped.
  • Do not lead with raw bash logs and transcripts. Default view should be human-readable intent/progress, with raw detail beneath.
  • Do not force users to understand provider/API-key plumbing unless absolutely necessary. There are active onboarding/auth issues already; friction here is clearly real.

Specific design goals

  1. Time-to-first-success under 5 minutes A fresh user should go from install to “my CEO completed a first task” in one sitting.

  2. Board-level abstraction always wins The default UI should answer: what is the company doing, who is doing it, why does it matter, what did it cost, and what needs my approval.

  3. Conversation stays attached to work objects “Chat with CEO” should still resolve to strategy threads, decisions, tasks, or approvals.

  4. Progressive disclosure Top layer: human-readable summary. Middle layer: checklist/steps/artifacts. Bottom layer: raw logs/tool calls/transcript.

  5. Output-first Work is not done until the user can see the result: file, document, preview link, screenshot, plan, or PR.

  6. Execution visibility without log worship Active runs, recovery issues, productivity review states, blockers, and work products should be first-class surfaces. Raw transcripts are available when needed, but they are not the primary product surface.

  7. Local-first, cloud-ready The mental model should not change between local solo use and shared/private or public/cloud deployment.

  8. Safe autonomy Auto mode is allowed; hidden token burn is not.

  9. Thin core, rich edges Put optional chat, knowledge, and special surfaces into plugins/extensions rather than bloating the control plane.