Files

T

Dotta 9eac727cf1 [codex] Add skills CLI and catalog management (#6782 )

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies through
company-scoped control-plane workflows.
> - Agents need reusable, inspectable skills that can be installed,
reset, audited, exported, and assigned without bespoke local setup.
> - The existing skill truth model needed cleanup so bundled skills,
optional catalog skills, runtime skills, and adapter-provided skills
have clear provenance.
> - Operators also need a practical CLI and board UI for discovering and
managing company skills.
> - This pull request adds the skills CLI, packaged skills catalog,
company skills APIs, and catalog-aware board UI.
> - The benefit is a more reusable Paperclip company setup where skills
are portable, auditable, and easier for operators and agents to manage.

## What Changed

- Added `paperclipai skills` CLI commands and coverage for catalog
listing, installing, resetting, and inspecting company skills.
- Added a packaged `@paperclipai/skills-catalog` workspace with bundled
and optional skill content plus validation/build tests.
- Added shared company-skill types and validators used across CLI,
server, and UI contracts.
- Added server catalog APIs/services for company skill catalog
operations, reset semantics, audit behavior, and portability provenance.
- Updated adapter skill handling so runtime/catalog provenance remains
explicit across local adapters.
- Added board UI support for browsing and managing catalog-backed
company skills.
- Updated docs for the skills CLI/catalog flow and the company skills
Paperclip skill reference.
- Rebased the branch onto current `paperclipai/paperclip:master`; no
`pnpm-lock.yaml`, `.github/workflows`, or migration files are included
in the final PR diff.

## Verification

- Passed: `pnpm run preflight:workspace-links && pnpm exec vitest run
cli/src/__tests__/skills.test.ts
packages/skills-catalog/src/catalog-builder.test.ts
packages/skills-catalog/src/shipped-catalog.test.ts
packages/shared/src/validators/company-skill.test.ts
packages/adapter-utils/src/server-utils.test.ts
packages/plugins/create-paperclip-plugin/src/entrypoints.test.ts
server/src/__tests__/company-skills-catalog-service.test.ts
server/src/__tests__/company-skills-routes.test.ts
server/src/__tests__/company-portability.test.ts`.
- Passed: `pnpm exec vitest run
server/src/__tests__/workspace-runtime.test.ts -t "default
branch|origin/master|symbolic-ref"`.
- Attempted: full `server/src/__tests__/workspace-runtime.test.ts`. Four
provisioning tests failed while seeding an isolated worktree database
from the local Paperclip instance because the local plugin schema dump
contains a duplicate-column foreign key
(`plugin_content_machine_18a7bc327b.content_case_signals`). The
default-branch tests touched by the rebase conflict passed in the
focused run above.
- Checked final diff: no `pnpm-lock.yaml`, no `.github/workflows`, and
no migration-file changes relative to `master`.

## Risks

- Medium: this is a broad skills/catalog change touching CLI, server
APIs, shared contracts, adapter skill sync, and UI.
- Catalog validation and reset semantics need careful reviewer attention
because they affect reusable company setup and portability.
- No database migrations are included in this PR, so there is no
migration ordering/idempotency risk in the final diff.
- No lockfile is included by design; dependency resolution will be
handled by the repository lockfile workflow.

## Model Used

- OpenAI Codex coding agent based on GPT-5, running in Paperclip via the
`codex_local` adapter with shell, git, GitHub CLI, and code-editing tool
access. Exact hosted model build/context-window metadata is not exposed
in this runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run targeted tests locally and documented the local
workspace-runtime seed failure above
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, screenshots were intentionally
omitted per PAP-10124 instructions; UI behavior is covered by tests and
reviewer inspection
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>

2026-05-28 07:33:51 -10:00

3.8 KiB

Raw Blame History

name, description, key, recommendedForRoles, tags

name

description

key

recommendedForRoles

QA Acceptance

Write acceptance criteria that a reviewer can run against the running app and decide pass or fail without asking the author. The criteria are the contract — automated tests cover correctness, QA covers feature-level behavior.

When to use

A feature change is heading to QA and needs a written validation plan.
A reviewer is asked to verify a PR that touches user-visible behavior.
An incident postmortem requires a regression check before reopen-prevention.
A release candidate needs a pre-cut smoke pass.

When not to use

The change is unit-test-only (utility refactor, internal naming). Acceptance criteria are unnecessary churn.
You are asked to write tests against API contracts. Use contract testing, not feature QA.

Acceptance criteria format

Each criterion is a single, independently-verifiable statement:

- **Given** <starting state>, **when** <action>, **then** <observable outcome>.

Example:

- **Given** a CSV export with 0 rows, **when** the user clicks Export, **then** the file downloads with only the header row and the UI shows "Exported 0 rows".

Avoid criteria that combine multiple whens or thens. Split them.

What every plan must cover

Golden path. The most common successful flow, end to end.
Empty and minimum states. Zero items, one item, missing optional inputs.
Boundary inputs. Max length strings, max numeric values, unicode, RTL text where applicable.
Error states. Network failure, permission denied, validation failures, conflict (409), not found (404).
Concurrency and ordering. Two users acting at once, race against background jobs, refresh during mutation.
Performance envelope. The largest realistic input the change must handle without UI hangs or timeouts.
Backward compatibility. Existing data, existing URLs, persisted user preferences continue to work.
Telemetry and audit. Events, logs, or activity entries the change is supposed to emit.

If a section is genuinely not applicable, write "N/A: " — do not silently omit.

Evidence

Each criterion needs evidence on the verification pass:

Screenshot or short clip for UI behavior.
Copied console / network output for API behavior.
Log snippet or activity row for telemetry.
Timing measurement for performance criteria.

"Looks good to me" without evidence is not a pass.

Quarantine and follow-up

A failing criterion blocks acceptance unless explicitly waived by the owner with a tracked follow-up issue.
"Known issue" without a linked follow-up is not a waiver.
If you add a new criterion mid-pass, restart the pass — partial coverage hides regressions.

Handoff back to the author

Return the validation plan with three sections:

Pass. Criteria that passed, with one-line evidence summaries.
Fail. Criteria that failed, with the exact reproduction.
Blocked. Criteria you could not run, with why.

The author owns turning failures into either fixes or accepted deferrals.

Anti-patterns

Acceptance phrased as test plan ("write a Cypress test for X"). Acceptance is what is true after the change ships; tests are how you check.
Criteria that depend on inspecting implementation details (selectors, query plans). Stay observable.
Long checklists with no priority. Mark must-pass criteria distinctly from nice-to-have.
Validation reports that say "passed" with no evidence. Reviewers cannot audit those.

3.8 KiB Raw Blame History