[codex] Add source-scoped recovery actions (#5599)

## Thinking Path

> - Paperclip is a control plane for autonomous AI companies, where work
must end with a clear disposition rather than ambiguous agent liveness.
> - Recovery currently detects stalled or missing-next-step issues, but
source issue recovery can become split across child recovery issues,
blockers, and comments.
> - That makes it harder for operators and agents to see who owns
recovery and what exact action is needed on the original issue.
> - Source-scoped recovery actions give the original issue a first-class
active recovery state with owner, evidence, wake policy, and resolution
outcome.
> - This pull request adds the recovery-action data model, backend
reconciliation and resolution APIs, and board UI indicators/actions.
> - The benefit is clearer stalled-work recovery without losing source
issue context or relying on comments as the liveness path.

## What Changed

- Added the `issue_recovery_actions` schema, shared
types/constants/validators, and an idempotent
`0084_issue_recovery_actions` migration ordered after current `master`
migrations.
- Updated stranded/missing-disposition recovery to create source-scoped
recovery actions, wake the recovery owner on the source issue, and avoid
locking the source issue for recovery-action wakes.
- Added API support for reading active recovery actions on issue
detail/list surfaces and resolving them with restored, blocked,
cancelled, or false-positive outcomes.
- Require blocked recovery resolutions to have an unresolved first-class
blocker, and removed the UI shortcut that could mark recovery blocked
without a blocker selection path.
- Surfaced recovery indicators/actions in the issue UI, blocker notices,
active run panels, issue rows, and Storybook coverage.
- Updated docs and focused tests for recovery semantics, ownership,
races, stale comments, and UI behavior.

## Verification

- `pnpm exec vitest run
server/src/__tests__/issue-recovery-actions.test.ts
server/src/__tests__/heartbeat-process-recovery.test.ts
ui/src/components/IssueRecoveryActionCard.test.tsx
ui/src/components/IssueBlockedNotice.test.tsx ui/src/api/issues.test.ts`
— 5 files, 72 tests passed.
- `pnpm --filter @paperclipai/shared typecheck` — passed.
- `pnpm --filter @paperclipai/db typecheck` — passed, including
migration numbering check.
- `pnpm --filter @paperclipai/server typecheck` — passed.
- `pnpm --filter @paperclipai/ui typecheck` — passed.
- Follow-up verification after blocker-resolution guard: `pnpm exec
vitest run server/src/__tests__/issue-recovery-actions.test.ts
ui/src/components/IssueRecoveryActionCard.test.tsx
ui/src/api/issues.test.ts` — 3 files, 27 tests passed.
- Follow-up `pnpm --filter @paperclipai/server typecheck` — passed.
- Follow-up `pnpm --filter @paperclipai/ui typecheck` — passed.
- UI states are available in
`ui/storybook/stories/source-issue-recovery.stories.tsx`; screenshot
capture helper is `scripts/screenshot-recovery-card.cjs`.

## Risks

- Medium: recovery behavior changes from child recovery issue ownership
toward source-scoped actions, so operators may see stalled-work state in
new places.
- Migration risk is mitigated by using the next migration slot after
`master` and making the table/constraints/index creation idempotent for
anyone who previously applied the old branch-local
`0082_dizzy_master_mold` migration.
- Existing child recovery issue paths are still guarded for
already-created recovery issues, but new source-scoped flows should be
watched in CI and Greptile review.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI Codex, GPT-5 coding agent, tool use enabled for shell, Git,
GitHub, and local test execution. Context window not exposed by the
runtime.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge

---------

Co-authored-by: Paperclip <noreply@paperclip.ing>
This commit is contained in:
Dotta
2026-05-12 09:37:15 -05:00
committed by GitHub
parent c445e59256
commit 0808b388ee
57 changed files with 3947 additions and 224 deletions
@@ -20,6 +20,7 @@ import {
heartbeatRuns,
issueComments,
issueDocuments,
issueRecoveryActions,
issueRelations,
issueTreeHoldMembers,
issueTreeHolds,
@@ -328,6 +329,7 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
await db.delete(documentRevisions);
await db.delete(documents);
await db.delete(issueRelations);
await db.delete(issueRecoveryActions);
await db.delete(issueTreeHoldMembers);
await db.delete(issueTreeHolds);
for (let attempt = 0; attempt < 5; attempt += 1) {
@@ -692,67 +694,76 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
return { companyId, agentId, issueId };
}
async function expectStrandedRecoveryArtifacts(input: {
async function expectSourceScopedStrandedRecoveryAction(input: {
companyId: string;
agentId: string;
issueId: string;
runId: string;
previousStatus: "todo" | "in_progress";
retryReason: "assignment_recovery" | "issue_continuation_needed";
retryReason?: "assignment_recovery" | "issue_continuation_needed" | null;
cause?: string;
kind?: string;
}) {
const recovery = await waitForValue(async () =>
db.select().from(issues).where(
const action = await waitForValue(async () =>
db.select().from(issueRecoveryActions).where(
and(
eq(issues.companyId, input.companyId),
eq(issues.originKind, "stranded_issue_recovery"),
eq(issues.originId, input.issueId),
eq(issueRecoveryActions.companyId, input.companyId),
eq(issueRecoveryActions.sourceIssueId, input.issueId),
),
).then((rows) => rows[0] ?? null),
);
if (!recovery) throw new Error("Expected stranded issue recovery issue to be created");
if (!action) throw new Error("Expected source-scoped stranded recovery action to be created");
expect(recovery).toMatchObject({
expect(action).toMatchObject({
companyId: input.companyId,
parentId: input.issueId,
assigneeAgentId: input.agentId,
originKind: "stranded_issue_recovery",
originId: input.issueId,
originRunId: input.runId,
priority: "medium",
assigneeAdapterOverrides: { modelProfile: "cheap" },
sourceIssueId: input.issueId,
recoveryIssueId: null,
kind: input.kind ?? "stranded_assigned_issue",
status: "active",
ownerType: "agent",
ownerAgentId: input.agentId,
previousOwnerAgentId: input.agentId,
returnOwnerAgentId: input.agentId,
cause: input.cause ?? "stranded_assigned_issue",
attemptCount: 1,
maxAttempts: null,
});
expect(recovery.title).toContain("Recover stalled issue");
expect(recovery.description).toContain(`Previous source status: \`${input.previousStatus}\``);
expect(recovery.description).toContain(`Retry reason: \`${input.retryReason}\``);
expect(recovery.description).toContain("Fix the runtime/adapter problem");
expect(action.evidence).toMatchObject({
sourceIssueId: input.issueId,
previousStatus: input.previousStatus,
latestRunId: input.runId,
retryReason: input.retryReason ?? null,
});
expect(action.nextAction).toContain(
input.kind === "missing_disposition" ? "valid issue disposition" : "Restore a live execution path",
);
const relation = await db
const recoveryIssues = await db
.select()
.from(issueRelations)
.where(
and(
eq(issueRelations.companyId, input.companyId),
eq(issueRelations.issueId, recovery.id),
eq(issueRelations.relatedIssueId, input.issueId),
eq(issueRelations.type, "blocks"),
),
)
.then((rows) => rows[0] ?? null);
expect(relation).toBeTruthy();
.from(issues)
.where(and(
eq(issues.companyId, input.companyId),
eq(issues.originKind, "stranded_issue_recovery"),
eq(issues.originId, input.issueId),
));
expect(recoveryIssues).toHaveLength(0);
const wakeups = await db
.select()
.from(agentWakeupRequests)
.where(eq(agentWakeupRequests.agentId, input.agentId));
const recoveryWakeup = wakeups.find((wakeup) => {
const payload = wakeup.payload as Record<string, unknown> | null;
return payload?.issueId === recovery.id &&
payload?.sourceIssueId === input.issueId &&
payload?.strandedRunId === input.runId;
const recoveryWakeup = await waitForValue(async () => {
const wakeups = await db
.select()
.from(agentWakeupRequests)
.where(eq(agentWakeupRequests.agentId, input.agentId));
return wakeups.find((wakeup) => {
const payload = wakeup.payload as Record<string, unknown> | null;
return payload?.issueId === input.issueId &&
payload?.sourceIssueId === input.issueId &&
payload?.recoveryActionId === action.id &&
payload?.strandedRunId === input.runId;
}) ?? null;
});
expect(recoveryWakeup).toMatchObject({
companyId: input.companyId,
reason: "issue_assigned",
reason: "source_scoped_recovery_action",
source: "assignment",
payload: expect.objectContaining({ modelProfile: "cheap" }),
});
@@ -765,15 +776,23 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
.then((rows) => rows[0] ?? null)
: null;
expect(recoveryRun?.contextSnapshot).toMatchObject({
issueId: recovery.id,
taskId: recovery.id,
source: "stranded_issue_recovery",
issueId: input.issueId,
taskId: input.issueId,
source: "issue_recovery_action",
recoveryActionId: action.id,
sourceIssueId: input.issueId,
strandedRunId: input.runId,
modelProfile: "cheap",
});
await waitForHeartbeatIdle(db);
const sourceIssue = await db
.select()
.from(issues)
.where(eq(issues.id, input.issueId))
.then((rows) => rows[0] ?? null);
expect(sourceIssue?.status).toBe("blocked");
return recovery;
return action;
}
async function sourceBlockerIssueIds(companyId: string, sourceIssueId: string) {
@@ -1056,7 +1075,7 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
expect(blockedIssue?.checkoutRunId).toBeNull();
if (!continuationRun?.id) throw new Error("Expected continuation recovery run to exist");
const recovery = await expectStrandedRecoveryArtifacts({
const recoveryAction = await expectSourceScopedStrandedRecoveryAction({
companyId,
agentId,
issueId,
@@ -1065,17 +1084,7 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
retryReason: "issue_continuation_needed",
});
const blockerRelations = await db
.select()
.from(issueRelations)
.where(
and(
eq(issueRelations.companyId, companyId),
eq(issueRelations.relatedIssueId, issueId),
eq(issueRelations.type, "blocks"),
),
);
expect(blockerRelations.map((relation) => relation.issueId)).toEqual([recovery.id]);
await expect(sourceBlockerIssueIds(companyId, issueId)).resolves.toEqual([]);
const comments = await waitForValue(async () => {
const rows = await db.select().from(issueComments).where(eq(issueComments.issueId, issueId));
@@ -1083,7 +1092,8 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
});
expect(comments).toHaveLength(1);
expect(comments[0]?.body).toContain("retried continuation");
expect(comments[0]?.body).toContain(`Recovery issue: [${recovery.identifier}]`);
expect(comments[0]?.body).toContain(`Recovery action: \`${recoveryAction.id}\``);
expect(comments[0]?.body).toContain("Recovery owner: [CodexCoder]");
});
it("blocks failed recovery work in place during immediate terminal-run cleanup", async () => {
@@ -1600,27 +1610,28 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
expect(result.successfulRunHandoffEscalated).toBe(1);
expect(result.issueIds).toEqual([issueId]);
const recovery = await waitForValue(async () =>
db.select().from(issues).where(
and(
eq(issues.companyId, companyId),
eq(issues.originKind, "stranded_issue_recovery"),
eq(issues.originId, issueId),
),
).then((rows) => rows[0] ?? null),
);
expect(recovery?.assigneeAgentId).toBe(agentId);
expect(recovery?.title).toContain("Recover missing next step");
expect(recovery?.description).toContain("Normalized cause: `successful_run_missing_state`");
expect(recovery?.description).toContain("not a runtime/adapter crash report");
expect(recovery?.description).toContain(`Source run: [\`${sourceRunId}\`]`);
expect(recovery?.description).toContain("Missing disposition: `clear_next_step`");
expect(recovery?.description).toContain("Source assignee: [CodexCoder]");
expect(recovery?.description).not.toContain("sk-test-successful-handoff-secret");
const recoveryAction = await expectSourceScopedStrandedRecoveryAction({
companyId,
agentId,
issueId,
runId,
previousStatus: "in_progress",
retryReason: null,
cause: SUCCESSFUL_RUN_MISSING_STATE_REASON,
kind: "missing_disposition",
});
expect(recoveryAction.evidence).toMatchObject({
sourceRunId,
missingDisposition: "clear_next_step",
latestRunStatus: "failed",
latestRunErrorCode: "adapter_failed",
recoveryCause: SUCCESSFUL_RUN_MISSING_STATE_REASON,
});
expect(JSON.stringify(recoveryAction.evidence)).not.toContain("sk-test-successful-handoff-secret");
const sourceIssue = await db.select().from(issues).where(eq(issues.id, issueId)).then((rows) => rows[0] ?? null);
expect(sourceIssue?.status).toBe("blocked");
await expect(sourceBlockerIssueIds(companyId, issueId)).resolves.toEqual([recovery?.id]);
await expect(sourceBlockerIssueIds(companyId, issueId)).resolves.toEqual([]);
const comments = await db.select().from(issueComments).where(eq(issueComments.issueId, issueId));
expect(comments[0]?.body).toBe(SUCCESSFUL_RUN_HANDOFF_EXHAUSTED_NOTICE_BODY);
@@ -1636,7 +1647,7 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
expect.objectContaining({
title: "Recovery owner",
rows: expect.arrayContaining([
expect.objectContaining({ type: "issue_link", identifier: recovery?.identifier }),
expect.objectContaining({ type: "key_value", label: "Recovery action", value: recoveryAction.id }),
expect.objectContaining({ type: "agent_link", label: "Recovery owner", name: "CodexCoder" }),
]),
}),
@@ -1657,7 +1668,7 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
});
it("escalates an exhausted successful handoff run that still leaves no disposition", async () => {
const { companyId, runId, issueId } = await seedStrandedIssueFixture({
const { companyId, agentId, runId, issueId } = await seedStrandedIssueFixture({
status: "in_progress",
runStatus: "succeeded",
livenessState: "advanced",
@@ -1687,17 +1698,21 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
expect(result.successfulContinuationObserved).toBe(0);
expect(result.successfulRunHandoffEscalated).toBe(1);
const recovery = await waitForValue(async () =>
db.select().from(issues).where(
and(
eq(issues.companyId, companyId),
eq(issues.originKind, "stranded_issue_recovery"),
eq(issues.originId, issueId),
),
).then((rows) => rows[0] ?? null),
);
expect(recovery?.description).toContain("Latest handoff run status: `succeeded`");
expect(recovery?.description).toContain("Suggested");
const recoveryAction = await expectSourceScopedStrandedRecoveryAction({
companyId,
agentId,
issueId,
runId,
previousStatus: "in_progress",
retryReason: null,
cause: SUCCESSFUL_RUN_MISSING_STATE_REASON,
kind: "missing_disposition",
});
expect(recoveryAction.evidence).toMatchObject({
sourceRunId,
latestRunStatus: "succeeded",
missingDisposition: "clear_next_step",
});
});
it("clears the detached warning when the run reports activity again", async () => {
@@ -2063,7 +2078,7 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
const issue = await db.select().from(issues).where(eq(issues.id, issueId)).then((rows) => rows[0] ?? null);
expect(issue?.status).toBe("blocked");
const recovery = await expectStrandedRecoveryArtifacts({
const recoveryAction = await expectSourceScopedStrandedRecoveryAction({
companyId,
agentId,
issueId,
@@ -2071,13 +2086,14 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
previousStatus: "todo",
retryReason: "assignment_recovery",
});
expect(recovery.description ?? "").not.toContain("sk-test-recovery-secret");
expect(JSON.stringify(recoveryAction.evidence)).not.toContain("sk-test-recovery-secret");
const comments = await db.select().from(issueComments).where(eq(issueComments.issueId, issueId));
expect(comments).toHaveLength(1);
expect(comments[0]?.body).toContain("retried dispatch");
expect(comments[0]?.body).toContain("Latest retry failure details were withheld from the issue thread");
expect(comments[0]?.body).toContain(`Recovery issue: [${recovery.identifier}]`);
expect(comments[0]?.body).toContain(`Recovery action: \`${recoveryAction.id}\``);
expect(comments[0]?.body).toContain("Recovery owner: [CodexCoder]");
});
it("blocks an already stranded recovery issue without creating a recovery child", async () => {
@@ -2457,7 +2473,7 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
const issue = await db.select().from(issues).where(eq(issues.id, issueId)).then((rows) => rows[0] ?? null);
expect(issue?.status).toBe("blocked");
const recovery = await expectStrandedRecoveryArtifacts({
const recoveryAction = await expectSourceScopedStrandedRecoveryAction({
companyId,
agentId,
issueId,
@@ -2470,7 +2486,8 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
expect(comments).toHaveLength(1);
expect(comments[0]?.body).toContain("retried continuation");
expect(comments[0]?.body).toContain("Latest retry failure details were withheld from the issue thread");
expect(comments[0]?.body).toContain(`Recovery issue: [${recovery.identifier}]`);
expect(comments[0]?.body).toContain(`Recovery action: \`${recoveryAction.id}\``);
expect(comments[0]?.body).toContain("Recovery owner: [CodexCoder]");
});
it("redacts error-code-only stranded recovery failures in issue copy", async () => {
@@ -2486,7 +2503,7 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
const result = await heartbeat.reconcileStrandedAssignedIssues();
expect(result.escalated).toBe(1);
const recovery = await expectStrandedRecoveryArtifacts({
const recoveryAction = await expectSourceScopedStrandedRecoveryAction({
companyId,
agentId,
issueId,
@@ -2494,8 +2511,10 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
previousStatus: "in_progress",
retryReason: "issue_continuation_needed",
});
expect(recovery.description).toContain("Latest retry failure details were withheld from the issue thread");
expect(recovery.description).not.toContain("- Failure: none recorded");
expect(recoveryAction.evidence).toMatchObject({
latestRunErrorCode: "adapter_exit_code",
});
expect(JSON.stringify(recoveryAction.evidence)).not.toContain("- Failure: none recorded");
const comments = await db.select().from(issueComments).where(eq(issueComments.issueId, issueId));
expect(comments).toHaveLength(1);
@@ -2516,6 +2535,15 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
);
expect(results.every((result) => result.status === "fulfilled")).toBe(true);
const actions = await db
.select()
.from(issueRecoveryActions)
.where(and(
eq(issueRecoveryActions.companyId, companyId),
eq(issueRecoveryActions.sourceIssueId, issueId),
));
expect(actions).toHaveLength(1);
expect(actions[0]?.attemptCount).toBeGreaterThanOrEqual(1);
const recoveries = await db
.select()
.from(issues)
@@ -2524,8 +2552,8 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
eq(issues.originKind, "stranded_issue_recovery"),
eq(issues.originId, issueId),
));
expect(recoveries).toHaveLength(1);
await expect(sourceBlockerIssueIds(companyId, issueId)).resolves.toEqual([recoveries[0]?.id]);
expect(recoveries).toHaveLength(0);
await expect(sourceBlockerIssueIds(companyId, issueId)).resolves.toEqual([]);
});
it("blocks stranded recovery issues in place instead of creating nested recovery issues", async () => {
@@ -2783,7 +2811,7 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
const issue = await db.select().from(issues).where(eq(issues.id, issueId)).then((rows) => rows[0] ?? null);
expect(issue?.status).toBe("blocked");
const recovery = await expectStrandedRecoveryArtifacts({
const recoveryAction = await expectSourceScopedStrandedRecoveryAction({
companyId,
agentId,
issueId,
@@ -2796,7 +2824,8 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
expect(comments).toHaveLength(1);
expect(comments[0]?.body).toContain("automatically retried continuation");
expect(comments[0]?.body).toContain("still has no live execution path");
expect(comments[0]?.body).toContain(`Recovery issue: [${recovery.identifier}]`);
expect(comments[0]?.body).toContain(`Recovery action: \`${recoveryAction.id}\``);
expect(comments[0]?.body).toContain("Recovery owner: [CodexCoder]");
});
it("allows one productive-terminal recovery after regular continuation recovery made progress", async () => {