forked from farhoodlabs/paperclip
[codex] Harden issue recovery reliability (#4875)
## Thinking Path > - Paperclip is the control plane for autonomous agent companies, so non-terminal issue state must always have a clear live, waiting, or recovery owner. > - This change stays inside the server reliability and liveness subsystem for assigned issue recovery, blocker attention, and live-run polling. > - Closed PR #4860 mixed this reliability work with separate mutation-boundary policy changes, which made review and merge risk too broad. > - [PAP-2981](/PAP/issues/PAP-2981) asked for a replacement PR containing only the remaining reliability slice and explicitly excluding user-assignment and execution-policy restrictions. > - Follow-up review also split `advanced` run-liveness continuation behavior out of this PR so it can be reviewed separately. > - The implementation hardens repeated recovery escalation, expands blocker-attention coverage for explicit waiting and recovery paths, and caps company live-run polling defaults. > - The benefit is a smaller reliability PR that improves liveness behavior without changing agent/user mutation authorization boundaries or `advanced` continuation semantics. ## What Changed - Avoid repeated liveness escalation updates when the source issue is already blocked by the same open escalation. - Treat open liveness escalation recovery issues, their source issues, and their leaf blockers as covered waiting paths in blocker attention. - Cap default company live-run polling at 50 rows for both `minCount` and `limit`, including explicit zero values, to avoid unbounded responses. - Preserve the existing behavior where succeeded `advanced` runs are considered productive/healthy for stranded-work recovery and are not actionable bounded run-liveness continuations. - Added focused server coverage for recovery dedupe, blocker attention, liveness escalation, run continuations, and live-run polling. ## Verification - `pnpm install --frozen-lockfile` - `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts server/src/__tests__/heartbeat-issue-liveness-escalation.test.ts server/src/__tests__/issue-blocker-attention.test.ts server/src/__tests__/run-continuations.test.ts server/src/__tests__/agent-live-run-routes.test.ts` - Result: 5 files passed, 63 tests passed. - `pnpm --filter @paperclipai/server typecheck` - Result: passed. - No UI changes; screenshots are not applicable. ## Risks - Recovery and blocker-attention classification changes can affect which blocked chains are shown as covered versus needing attention. - Live-run polling now treats omitted, invalid, or non-positive `limit` / `minCount` values as the capped default of 50. - `advanced` run-liveness continuation behavior is intentionally excluded from this PR and split for separate review. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5, code execution and GitHub CLI tool use, medium reasoning effort. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
This commit is contained in:
@@ -289,10 +289,23 @@ describeEmbeddedPostgres("heartbeat issue graph liveness escalation", () => {
|
||||
const heartbeat = heartbeatService(db);
|
||||
|
||||
const first = await heartbeat.reconcileIssueGraphLiveness();
|
||||
const second = await heartbeat.reconcileIssueGraphLiveness();
|
||||
|
||||
expect(first.escalationsCreated).toBe(1);
|
||||
const [sourceAfterFirst] = await db
|
||||
.select({ updatedAt: issues.updatedAt })
|
||||
.from(issues)
|
||||
.where(eq(issues.id, blockedIssueId));
|
||||
const eventsAfterFirst = await db.select().from(activityLog).where(eq(activityLog.companyId, companyId));
|
||||
expect(eventsAfterFirst.filter((event) => event.action === "issue.blockers.updated")).toHaveLength(1);
|
||||
|
||||
const second = await heartbeat.reconcileIssueGraphLiveness();
|
||||
|
||||
expect(second.escalationsCreated).toBe(0);
|
||||
const [sourceAfterSecond] = await db
|
||||
.select({ updatedAt: issues.updatedAt })
|
||||
.from(issues)
|
||||
.where(eq(issues.id, blockedIssueId));
|
||||
expect(sourceAfterSecond?.updatedAt.getTime()).toBe(sourceAfterFirst?.updatedAt.getTime());
|
||||
|
||||
const escalations = await db
|
||||
.select()
|
||||
@@ -345,7 +358,7 @@ describeEmbeddedPostgres("heartbeat issue graph liveness escalation", () => {
|
||||
projectWorkspaceSourceIssueId: blockerIssueId,
|
||||
},
|
||||
});
|
||||
expect(events.some((event) => event.action === "issue.blockers.updated")).toBe(true);
|
||||
expect(events.filter((event) => event.action === "issue.blockers.updated")).toHaveLength(1);
|
||||
});
|
||||
|
||||
it("skips budget-blocked direct owners and assigns recovery to the manager fallback", async () => {
|
||||
|
||||
Reference in New Issue
Block a user