Files

T

Dotta 9eac727cf1 [codex] Add skills CLI and catalog management (#6782 )

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies through
company-scoped control-plane workflows.
> - Agents need reusable, inspectable skills that can be installed,
reset, audited, exported, and assigned without bespoke local setup.
> - The existing skill truth model needed cleanup so bundled skills,
optional catalog skills, runtime skills, and adapter-provided skills
have clear provenance.
> - Operators also need a practical CLI and board UI for discovering and
managing company skills.
> - This pull request adds the skills CLI, packaged skills catalog,
company skills APIs, and catalog-aware board UI.
> - The benefit is a more reusable Paperclip company setup where skills
are portable, auditable, and easier for operators and agents to manage.

## What Changed

- Added `paperclipai skills` CLI commands and coverage for catalog
listing, installing, resetting, and inspecting company skills.
- Added a packaged `@paperclipai/skills-catalog` workspace with bundled
and optional skill content plus validation/build tests.
- Added shared company-skill types and validators used across CLI,
server, and UI contracts.
- Added server catalog APIs/services for company skill catalog
operations, reset semantics, audit behavior, and portability provenance.
- Updated adapter skill handling so runtime/catalog provenance remains
explicit across local adapters.
- Added board UI support for browsing and managing catalog-backed
company skills.
- Updated docs for the skills CLI/catalog flow and the company skills
Paperclip skill reference.
- Rebased the branch onto current `paperclipai/paperclip:master`; no
`pnpm-lock.yaml`, `.github/workflows`, or migration files are included
in the final PR diff.

## Verification

- Passed: `pnpm run preflight:workspace-links && pnpm exec vitest run
cli/src/__tests__/skills.test.ts
packages/skills-catalog/src/catalog-builder.test.ts
packages/skills-catalog/src/shipped-catalog.test.ts
packages/shared/src/validators/company-skill.test.ts
packages/adapter-utils/src/server-utils.test.ts
packages/plugins/create-paperclip-plugin/src/entrypoints.test.ts
server/src/__tests__/company-skills-catalog-service.test.ts
server/src/__tests__/company-skills-routes.test.ts
server/src/__tests__/company-portability.test.ts`.
- Passed: `pnpm exec vitest run
server/src/__tests__/workspace-runtime.test.ts -t "default
branch|origin/master|symbolic-ref"`.
- Attempted: full `server/src/__tests__/workspace-runtime.test.ts`. Four
provisioning tests failed while seeding an isolated worktree database
from the local Paperclip instance because the local plugin schema dump
contains a duplicate-column foreign key
(`plugin_content_machine_18a7bc327b.content_case_signals`). The
default-branch tests touched by the rebase conflict passed in the
focused run above.
- Checked final diff: no `pnpm-lock.yaml`, no `.github/workflows`, and
no migration-file changes relative to `master`.

## Risks

- Medium: this is a broad skills/catalog change touching CLI, server
APIs, shared contracts, adapter skill sync, and UI.
- Catalog validation and reset semantics need careful reviewer attention
because they affect reusable company setup and portability.
- No database migrations are included in this PR, so there is no
migration ordering/idempotency risk in the final diff.
- No lockfile is included by design; dependency resolution will be
handled by the repository lockfile workflow.

## Model Used

- OpenAI Codex coding agent based on GPT-5, running in Paperclip via the
`codex_local` adapter with shell, git, GitHub CLI, and code-editing tool
access. Exact hosted model build/context-window metadata is not exposed
in this runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run targeted tests locally and documented the local
workspace-runtime seed failure above
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, screenshots were intentionally
omitted per PAP-10124 instructions; UI behavior is covered by tests and
reviewer inspection
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>

2026-05-28 07:33:51 -10:00

5.0 KiB

Raw Blame History

name, description, key, recommendedForRoles, tags

name

description

key

recommendedForRoles

Agent Browser

Use a controlled browser to verify behavior, capture evidence, or extract information from web pages that a static fetch cannot reach (SPAs, login-gated pages, dynamic content). This skill is about supervised verification, not unattended scraping.

When to use

You need a screenshot of a deployed page or a local dev server to confirm a UI change.
You need to read JavaScript-rendered content that curl/wget will not see.
A user reports a UI bug and you need to reproduce it interactively to capture console errors, network requests, or layout state.
You need to walk through a short flow (load page, click, observe) to verify acceptance criteria.

When not to use

The page is reachable as static HTML. Use curl/HTTP fetch — it is cheaper, faster, and more reliable.
The task is unattended large-scale scraping. That belongs to a dedicated scraper with rate limits, robots.txt handling, and a real user agent policy — not this skill.
The site is behind authentication you do not own credentials for, or whose terms of service prohibit automation.
The site involves sensitive accounts (banking, healthcare, government) where automation risks lockout or compliance issues.

Before launching the browser

Confirm the URL and what state should be true after navigation.
Decide what evidence is needed: full-page screenshot, viewport screenshot, console log, network trace, HTML snapshot, extracted text.
Decide the viewport size that matters for the task (mobile vs desktop). Default to a desktop size unless the task is mobile-specific.
For local dev servers, confirm the server is running and the port is what you expect.

Driving the browser

A typical verification session:

Launch with a real-looking user agent when the target is the public internet; an unrealistic UA flags automation traffic.
Set a sane viewport (e.g., 1366×768 desktop, 390×844 iPhone-ish).
Navigate and wait for the right signal. Prefer waiting for a specific selector or network-idle over arbitrary sleeps.
Capture evidence immediately after the wait condition succeeds, before any interaction perturbs the state.
Interact deliberately. One click at a time, with a wait between actions; re-screenshot after each meaningful state change.
Read the console and network panels for unexpected errors, 4xx/5xx responses, or slow requests.
Close the browser cleanly when done. Long-running browser sessions leak memory and hold ports.

What evidence to record

For a verification task, deliver:

A full-page or viewport screenshot of each meaningful state.
The console log, filtered to warnings/errors.
Any non-2xx network response with the URL, status, and a short response body excerpt.
A short narration: "Navigated to X, observed Y, clicked Z, observed W."

For a UI bug repro, also record:

The exact reproduction steps the user can follow.
Viewport size and (where relevant) device pixel ratio.
Whether the bug reproduces on first load vs after interaction.

Prefer programmatic auth (API token, magic link) over UI login.
If UI login is the only path, the user must provide credentials explicitly for this run. Never reuse credentials outside the session.
Do not store credentials in the session log, screenshot, or returned output.

Performance and politeness

Throttle to one navigation per few seconds when touching shared infra.
Respect robots.txt for public sites you are inspecting at any volume.
Cancel navigations if a page exceeds a reasonable timeout (e.g., 30s); the page is broken or rate-limiting you.
Do not retry forever on failure. Retry once with a longer timeout, then escalate.

Common failure modes

Selector not found. Page changed, or you are waiting before render. Take a screenshot to see actual state; adjust the selector.
Click does nothing. The element is offscreen, covered by a modal, or in a shadow DOM. Scroll into view or pierce the shadow root.
Headless detection. Some sites detect headless Chrome and serve a different page. Use a non-headless mode or a fingerprint-realistic configuration only when authorized.
Cross-origin iframe blocking. Iframes you do not own cannot be inspected; the page must offer the data outside the iframe or the task is infeasible.

Anti-patterns

Long unsupervised browser sessions that drift from the original task.
Scraping behind authentication you do not own.
Captioning a screenshot with "looks good" without saying what state was loaded and what selectors confirmed it.
Treating a passing screenshot as proof of correctness across viewports you did not actually test.

5.0 KiB Raw Blame History Unescape Escape