forked from farhoodlabs/paperclip
[codex] Harden issue recovery reliability (#4875)
## Thinking Path > - Paperclip is the control plane for autonomous agent companies, so non-terminal issue state must always have a clear live, waiting, or recovery owner. > - This change stays inside the server reliability and liveness subsystem for assigned issue recovery, blocker attention, and live-run polling. > - Closed PR #4860 mixed this reliability work with separate mutation-boundary policy changes, which made review and merge risk too broad. > - [PAP-2981](/PAP/issues/PAP-2981) asked for a replacement PR containing only the remaining reliability slice and explicitly excluding user-assignment and execution-policy restrictions. > - Follow-up review also split `advanced` run-liveness continuation behavior out of this PR so it can be reviewed separately. > - The implementation hardens repeated recovery escalation, expands blocker-attention coverage for explicit waiting and recovery paths, and caps company live-run polling defaults. > - The benefit is a smaller reliability PR that improves liveness behavior without changing agent/user mutation authorization boundaries or `advanced` continuation semantics. ## What Changed - Avoid repeated liveness escalation updates when the source issue is already blocked by the same open escalation. - Treat open liveness escalation recovery issues, their source issues, and their leaf blockers as covered waiting paths in blocker attention. - Cap default company live-run polling at 50 rows for both `minCount` and `limit`, including explicit zero values, to avoid unbounded responses. - Preserve the existing behavior where succeeded `advanced` runs are considered productive/healthy for stranded-work recovery and are not actionable bounded run-liveness continuations. - Added focused server coverage for recovery dedupe, blocker attention, liveness escalation, run continuations, and live-run polling. ## Verification - `pnpm install --frozen-lockfile` - `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts server/src/__tests__/heartbeat-issue-liveness-escalation.test.ts server/src/__tests__/issue-blocker-attention.test.ts server/src/__tests__/run-continuations.test.ts server/src/__tests__/agent-live-run-routes.test.ts` - Result: 5 files passed, 63 tests passed. - `pnpm --filter @paperclipai/server typecheck` - Result: passed. - No UI changes; screenshots are not applicable. ## Risks - Recovery and blocker-attention classification changes can affect which blocked chains are shown as covered versus needing attention. - Live-run polling now treats omitted, invalid, or non-positive `limit` / `minCount` values as the capped default of 50. - `advanced` run-liveness continuation behavior is intentionally excluded from this PR and split for separate review. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5, code execution and GitHub CLI tool use, medium reasoning effort. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
This commit is contained in:
@@ -76,6 +76,9 @@ describeEmbeddedPostgres("issue blocker attention", () => {
|
||||
status: string;
|
||||
parentId?: string | null;
|
||||
assigneeAgentId?: string | null;
|
||||
originKind?: string | null;
|
||||
originId?: string | null;
|
||||
originFingerprint?: string | null;
|
||||
}) {
|
||||
const id = input.id ?? randomUUID();
|
||||
await db.insert(issues).values({
|
||||
@@ -87,6 +90,9 @@ describeEmbeddedPostgres("issue blocker attention", () => {
|
||||
priority: "medium",
|
||||
parentId: input.parentId ?? null,
|
||||
assigneeAgentId: input.assigneeAgentId ?? null,
|
||||
originKind: input.originKind ?? "manual",
|
||||
originId: input.originId ?? null,
|
||||
originFingerprint: input.originFingerprint ?? "default",
|
||||
});
|
||||
return id;
|
||||
}
|
||||
@@ -356,6 +362,52 @@ describeEmbeddedPostgres("issue blocker attention", () => {
|
||||
});
|
||||
});
|
||||
|
||||
it("treats open liveness escalation blockers as covered waiting paths", async () => {
|
||||
const { companyId, agentId } = await createCompany("PBL");
|
||||
const parentId = await insertIssue({ companyId, identifier: "PBL-1", title: "Parent", status: "blocked" });
|
||||
const cancelledLeafId = await insertIssue({
|
||||
companyId,
|
||||
identifier: "PBL-2",
|
||||
title: "Cancelled blocker",
|
||||
status: "cancelled",
|
||||
assigneeAgentId: agentId,
|
||||
});
|
||||
const incidentKey = [
|
||||
"harness_liveness",
|
||||
companyId,
|
||||
parentId,
|
||||
"blocked_by_cancelled_issue",
|
||||
cancelledLeafId,
|
||||
].join(":");
|
||||
const escalationId = await insertIssue({
|
||||
companyId,
|
||||
identifier: "PBL-3",
|
||||
title: "Liveness escalation",
|
||||
status: "todo",
|
||||
assigneeAgentId: agentId,
|
||||
originKind: "harness_liveness_escalation",
|
||||
originId: incidentKey,
|
||||
originFingerprint: [
|
||||
"harness_liveness_leaf",
|
||||
companyId,
|
||||
"blocked_by_cancelled_issue",
|
||||
cancelledLeafId,
|
||||
].join(":"),
|
||||
});
|
||||
await block({ companyId, blockerIssueId: cancelledLeafId, blockedIssueId: parentId });
|
||||
await block({ companyId, blockerIssueId: escalationId, blockedIssueId: parentId });
|
||||
|
||||
const parent = (await svc.list(companyId, { status: "blocked,todo" })).find((issue) => issue.id === parentId);
|
||||
|
||||
expect(parent?.blockerAttention).toMatchObject({
|
||||
state: "covered",
|
||||
reason: "active_dependency",
|
||||
unresolvedBlockerCount: 2,
|
||||
coveredBlockerCount: 2,
|
||||
attentionBlockerCount: 0,
|
||||
});
|
||||
});
|
||||
|
||||
it("does not treat a scheduled retry as actively covered work", async () => {
|
||||
const { companyId, agentId } = await createCompany("PBY");
|
||||
const parentId = await insertIssue({ companyId, identifier: "PBY-1", title: "Parent", status: "blocked" });
|
||||
|
||||
Reference in New Issue
Block a user