Files

T

Dotta 7f893ac4ec [codex] Harden execution reliability and heartbeat tooling (#3679 )

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies
> - Reliable execution depends on heartbeat routing, issue lifecycle
semantics, telemetry, and a fast enough local verification loop to keep
regressions visible
> - The remaining commits on this branch were mostly server/runtime
correctness fixes plus test and documentation follow-ups in that area
> - Those changes are logically separate from the UI-focused
issue-detail and workspace/navigation branches even when they touch
overlapping issue APIs
> - This pull request groups the execution reliability, heartbeat,
telemetry, and tooling changes into one standalone branch
> - The benefit is a focused review of the control-plane correctness
work, including the follow-up fix that restored the implicit
comment-reopen helpers after branch splitting

## What Changed

- Hardened issue/heartbeat execution behavior, including self-review
stage skipping, deferred mention wakes during active execution, stranded
execution recovery, active-run scoping, assignee resolution, and
blocked-to-todo wake resumption
- Reduced noisy polling/logging overhead by trimming issue run payloads,
compacting persisted run logs, silencing high-volume request logs, and
capping heartbeat-run queries in dashboard/inbox surfaces
- Expanded telemetry and status semantics with adapter/model fields on
task completion plus clearer status guidance in docs/onboarding material
- Updated test infrastructure and verification defaults with faster
route-test module isolation, cheaper default `pnpm test`, e2e isolation
from local state, and repo verification follow-ups
- Included docs/release housekeeping from the branch and added a small
follow-up commit restoring the implicit comment-reopen helpers that were
dropped during branch reconstruction

## Verification

- `pnpm vitest run
server/src/__tests__/issue-comment-reopen-routes.test.ts
server/src/__tests__/issue-telemetry-routes.test.ts`
- `pnpm vitest run server/src/__tests__/http-log-policy.test.ts
server/src/__tests__/heartbeat-run-log.test.ts
server/src/__tests__/health.test.ts`
- `server/src/__tests__/activity-service.test.ts`,
`server/src/__tests__/heartbeat-comment-wake-batching.test.ts`, and
`server/src/__tests__/heartbeat-process-recovery.test.ts` were attempted
on this host but the embedded Postgres harness reported
init-script/data-dir problems and skipped or failed to start, so they
are noted as environment-limited

## Risks

- Medium: this branch changes core issue/heartbeat routing and
reopen/wakeup behavior, so regressions would affect agent execution flow
rather than isolated UI polish
- Because it also updates verification infrastructure, reviewers should
pay attention to whether the new tests are asserting the right failure
modes and not just reshaping harness behavior

## Model Used

- OpenAI Codex coding agent (GPT-5-class runtime in Codex CLI; exact
deployed model ID is not exposed in this environment), reasoning
enabled, tool use and local code execution enabled

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [ ] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [ ] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>

2026-04-14 13:34:52 -05:00

13 KiB

Raw Permalink Blame History

VS Code Task Interoperability Plan

Status: planning only, no code changes Date: 2026-04-12 Related issue: PAP-1377

Summary

Paperclip should not replace its workspace runtime service model with VS Code tasks. It should add a narrow interoperability layer that can discover and adopt supported entries from .vscode/tasks.json.

The core product model should stay:

Paperclip owns long-running workspace services and their desired state
Paperclip shows operators exactly which named thing they are starting or stopping
Paperclip distinguishes long-running services from one-shot jobs

VS Code tasks should be treated as:

an import/discovery format for workspace commands
a convenience for repos that already maintain tasks.json
a partial compatibility layer, not a full execution model

Current State

The current implementation is already service-oriented:

project workspaces and execution workspaces can store workspaceRuntime config plus desiredState and per-service serviceStates
the UI renders one control row per configured service and persists start/stop intent
the backend supervises long-running local processes, reuses eligible services, and restores desired services on startup

Relevant files:

packages/shared/src/types/workspace-runtime.ts
server/src/services/workspace-runtime.ts
server/src/services/project-workspace-runtime-config.ts
ui/src/components/WorkspaceRuntimeControls.tsx
ui/src/pages/ProjectWorkspaceDetail.tsx
ui/src/pages/ExecutionWorkspaceDetail.tsx

This is directionally correct for Paperclip because it gives the control plane an explicit model for service lifecycle, health, reuse, and restart behavior.

Problem To Solve

The current UX is still too raw:

operators have to hand-author runtime JSON
a workspace can have multiple attached services, but the higher-level intent is not obvious
start/stop controls are visible in multiple places, which makes it easy to lose track of what is being controlled
there is no interoperability with repos that already define useful local workflows in .vscode/tasks.json

The issue is not that services are the wrong abstraction. The issue is that the configuration surface is too low-level and Paperclip does not yet leverage existing workspace metadata.

Recommendation

Keep Paperclip runtime services as the source of truth for service supervision. Add a new workspace command model above the raw JSON layer, with VS Code task discovery as one input.

The product model should become:

Workspace command A named runnable thing attached to a workspace.
Workspace service A workspace command that is expected to stay alive and be supervised.
Workspace job A workspace command that runs once and exits.
Runtime service instance The live process record that already exists today in Paperclip.

In that model, VS Code tasks are a way to populate workspace commands. Only commands that map cleanly to Paperclip service or job semantics should become runnable in Paperclip.

Why Not Fully Adopt VS Code Tasks

VS Code tasks are broader than Paperclip runtime services. They include shell/process tasks, compound tasks, background/watch tasks, presentation settings, extension/task-provider types, variable substitution, and problem-matcher-driven lifecycle.

That creates a bad fit if Paperclip tries to use tasks.json as its only runtime model:

many tasks are one-shot jobs, not long-running services
some tasks depend on VS Code task providers or editor-only variable resolution
compound task graphs are useful, but they are not the same thing as a supervised service
problem matcher readiness is useful metadata, but it is not enough to replace Paperclip's persisted service lifecycle model

The right boundary is interoperability, not replacement.

Interoperability Contract

Paperclip should support a conservative subset of VS Code tasks and clearly mark unsupported entries.

Supported in phase 1

shell and process tasks with a concrete command Paperclip can resolve
optional task options.cwd
optional task environment values that can be flattened safely
task labels and detail text for naming and display
dependsOn for import-time expansion or display-only dependency hints
background/watch-oriented tasks that can reasonably be treated as long-running services

Maybe supported in later phases

grouping and default task metadata for better UX
selected variable substitution when Paperclip can resolve it safely from workspace context
mapping task metadata into Paperclip readiness/expose hints
limited compound-task launch flows

Not supported initially

extension-provided task types Paperclip cannot execute directly
arbitrary VS Code variable substitution semantics
problem matcher parsing as the main source of service health
full parity with VS Code task execution behavior

Long-Running Service Detection

Paperclip needs an explicit classification layer instead of assuming every VS Code task is a service.

Recommended classification:

service Explicitly marked by Paperclip metadata, or confidently inferred from background/watch task semantics
job One-shot command expected to exit
unsupported Present in tasks.json, but not safely runnable by Paperclip

The important product decision is that service classification must be visible and editable by the operator. Inference can help, but it should not be the only source of truth.

Proposed Product Shape

1. Replace raw-first editing with command-first editing

Project and execution workspace pages should stop making raw runtime JSON the primary editing surface.

Default UI should show:

workspace commands
command type: service or job
source: Paperclip or VS Code
exact command and cwd
current state for services
explicit start, stop, restart, and run-now actions

Raw JSON should remain available behind an advanced section.

2. Add VS Code task discovery on workspaces

For a workspace with cwd, Paperclip should look for .vscode/tasks.json.

The workspace UI should show:

whether a tasks.json file was found
last parse time
supported commands discovered
unsupported tasks with reasons
whether commands are inherited into execution workspaces

3. Make the controlled thing explicit

Start and stop UI should always name the exact entry being controlled.

Examples:

Start web
Stop api
Run db:migrate

Avoid generic workspace-level labels when multiple commands exist.

4. Separate services from jobs in the UI

Do not mix one-shot jobs and long-running services into one undifferentiated list.

Recommended sections:

Services
Jobs
Unsupported imported tasks

That resolves the ambiguity called out in the issue.

Data Model Direction

Do not replace workspaceRuntime immediately. Instead add a higher-level representation that can compile down to the existing runtime-service machinery.

Suggested workspace metadata shape:

type WorkspaceCommandSource =
  | { type: "paperclip" }
  | { type: "vscode_task"; taskLabel: string; taskPath: ".vscode/tasks.json" };

type WorkspaceCommandKind = "service" | "job";

type WorkspaceCommandDefinition = {
  id: string;
  name: string;
  kind: WorkspaceCommandKind;
  source: WorkspaceCommandSource;
  command: string | null;
  cwd: string | null;
  env?: Record<string, string> | null;
  autoStart?: boolean;
  serviceConfig?: {
    lifecycle?: "shared" | "ephemeral";
    reuseScope?: "project_workspace" | "execution_workspace" | "run";
    readiness?: Record<string, unknown> | null;
    expose?: Record<string, unknown> | null;
  } | null;
  importWarnings?: string[];
  disabledReason?: string | null;
};

workspaceRuntime can then become a derived or advanced representation for service-type commands until the rest of the system is migrated.

VS Code Mapping Rules

Paperclip should map imported tasks with explicit, documented rules.

Recommended rules:

A task becomes a job by default.
A task becomes a service only when:
- Paperclip metadata marks it as a service, or
- the task clearly represents a background/watch process and the operator confirms the classification.
Unsupported tasks stay visible but disabled.
Task labels become default command names.
dependsOn is preserved as metadata, not silently flattened into hidden behavior.

Paperclip-specific metadata can live in a namespaced field on the imported task definition, for example:

{
  "label": "web",
  "type": "shell",
  "command": "pnpm dev",
  "isBackground": true,
  "paperclip": {
    "kind": "service",
    "readiness": {
      "type": "http",
      "urlTemplate": "http://127.0.0.1:${port}"
    },
    "expose": {
      "type": "url",
      "urlTemplate": "http://127.0.0.1:${port}"
    }
  }
}

That gives us interoperability without depending on VS Code-only semantics for service readiness and exposure.

Execution Policy

Project workspaces should be the main place where imported commands are discovered and curated. Execution workspaces should inherit that curated command set by default, with optional issue-level overrides.

Recommended precedence:

execution workspace override
project workspace command set
imported VS Code tasks from the linked workspace
advanced raw runtime fallback

This matches the existing direction in doc/plans/2026-03-10-workspace-strategy-and-git-worktrees.md.

Implementation Plan

Phase 1: Discovery and read-only visibility

Goal: show imported VS Code tasks in the workspace UI without changing runtime behavior.

Work:

parse .vscode/tasks.json for project workspaces with local cwd
derive a list of candidate commands plus unsupported items
show source, label, command, cwd, and classification
show parse warnings and unsupported reasons

Success condition: an operator can see what Paperclip would import and why.

Phase 2: Command model and explicit classification

Goal: introduce a first-class workspace command layer above raw runtime JSON.

Work:

add a persisted command definition model in workspace metadata or a dedicated table
allow operator edits to imported command classification
separate service and job in UI
keep existing runtime-service storage for live supervised processes

Success condition: the workspace UI is command-first, and raw runtime JSON is advanced-only.

Phase 3: Service execution backed by existing runtime supervisor

Goal: run supported imported service commands through the current Paperclip supervisor.

Work:

compile service commands into the existing runtime service start/stop path
persist desired state per named command
keep startup restoration behavior for service commands
make the active command name explicit everywhere control actions appear

Success condition: imported service commands behave like native Paperclip services once adopted.

Phase 4: Job execution and optional dependency handling

Goal: support one-shot imported commands without pretending they are services.

Work:

add Run actions for jobs
record output in workspace operations
optionally support simple dependsOn execution for jobs with clear logging

Success condition: one-shot tasks are runnable, but they are not mixed into the service lifecycle model.

Phase 5: Adapter and execution workspace integration

Goal: let agents and issue-scoped workspaces consume the curated command model consistently.

Work:

expose inherited workspace commands to execution workspaces
allow issue-level selection of a default service command when relevant
make service selection explicit in issue and workspace views

Success condition: agents, operators, and workspaces all refer to the same named commands.

Non-Goals

full VS Code task-runner parity
support for every VS Code task type
removal of Paperclip's own runtime supervision model
editor-dependent execution semantics inside the control plane

Risks

overfitting Paperclip to VS Code and making the model worse for non-VS-Code repos
misclassifying watch tasks as durable services
hiding too much detail and making debugging harder
allowing imported task graphs to become implicit magic

These risks are manageable if the import layer stays explicit, conservative, and operator-editable.

Decision

Paperclip should adopt VS Code tasks as an optional workspace command source, not as the canonical runtime model.

The main UX change should be:

move from raw runtime JSON to named workspace commands
separate services from jobs
make the exact controlled command explicit
let .vscode/tasks.json pre-populate those commands when available

External References

VS Code tasks documentation: https://code.visualstudio.com/docs/debugtest/tasks
Existing Paperclip workspace plan: doc/plans/2026-03-10-workspace-strategy-and-git-worktrees.md

13 KiB Raw Permalink Blame History

VS Code Task Interoperability Plan

Summary

Current State

Problem To Solve

Recommendation

Why Not Fully Adopt VS Code Tasks

Interoperability Contract

Supported in phase 1

Maybe supported in later phases

Not supported initially

Long-Running Service Detection

Proposed Product Shape

1. Replace raw-first editing with command-first editing

2. Add VS Code task discovery on workspaces

3. Make the controlled thing explicit

4. Separate services from jobs in the UI

Data Model Direction

VS Code Mapping Rules

Execution Policy

Implementation Plan

Phase 1: Discovery and read-only visibility

Phase 2: Command model and explicit classification

Phase 3: Service execution backed by existing runtime supervisor

Phase 4: Job execution and optional dependency handling

Phase 5: Adapter and execution workspace integration

Non-Goals

Risks

Decision

External References

13 KiB

Raw Permalink Blame History