Fix flaky heartbeat recovery teardown CI failure (#4559)

## Thinking Path

> - Paperclip orchestrates AI agents for zero-human companies.
> - The linked CI job is in the server test/recovery path, where
heartbeat runs and issue cleanup need to leave the control plane in a
consistent state even when retries fail.
> - In this case the failure was not runtime product behavior but test
teardown behavior inside `heartbeat-process-recovery.test.ts`.
> - The failing GitHub Actions job showed a foreign-key race on
`company_skills_company_id_companies_id_fk` while the test tried to
delete the parent company record.
> - The surrounding teardown code already uses bounded retry cleanup for
other dependent tables (`issues`, `heartbeatRuns`, and `agents`) because
this test file intentionally exercises asynchronous recovery flows.
> - This pull request applies that same retry pattern to the final
`db.delete(companies)` step, re-clearing `companySkills` before each
retry.
> - The benefit is a targeted fix for the CI flake without changing
runtime behavior or expanding the scope beyond the failing teardown
path.

## What Changed

- Wrapped the final `db.delete(companies)` call in
`server/src/__tests__/heartbeat-process-recovery.test.ts` with the same
5-attempt retry pattern already used elsewhere in that teardown.
- Re-cleared `companySkills` before each company-delete retry so
late-arriving FK-dependent rows do not mask the real test result.
- Verified the fix against the originally failing
`heartbeat-process-recovery` test file and the broader `pnpm test:run`
command under CI-like env conditions.

## Verification

- `pnpm exec vitest run
server/src/__tests__/heartbeat-process-recovery.test.ts`
- Re-ran `pnpm exec vitest run
server/src/__tests__/heartbeat-process-recovery.test.ts` multiple times
locally; the previously failing teardown stayed green.
- `env -u PAPERCLIP_API_URL -u PAPERCLIP_RUNTIME_API_URL -u
PAPERCLIP_RUN_ID -u PAPERCLIP_TASK_ID -u PAPERCLIP_AGENT_ID -u
PAPERCLIP_COMPANY_ID -u PAPERCLIP_API_KEY -u PAPERCLIP_WAKE_REASON -u
PAPERCLIP_WAKE_COMMENT_ID -u PAPERCLIP_WAKE_PAYLOAD_JSON -u
PAPERCLIP_APPROVAL_ID -u PAPERCLIP_APPROVAL_STATUS pnpm test:run`

## Risks

- Low risk. The change is test-only and scoped to teardown retry
behavior in a single server test file.
- If the underlying async cleanup behavior changes again, this test
could still become flaky in a different way, but this PR addresses the
specific FK race seen in the linked CI job.

> For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and
discuss it in `#dev` before opening the PR. Feature PRs that overlap
with planned core work may need to be redirected — check the roadmap
first. See `CONTRIBUTING.md`.

## Model Used

- OpenAI `gpt-5.4` via Paperclip `codex_local`, high reasoning mode,
with tool use for shell, git, HTTP API calls, and patch application.

## Checklist

- [x] I have included a thinking path that traces from project context
to this change
- [x] I have specified the model used (with version and capability
details)
- [x] I have checked ROADMAP.md and confirmed this PR does not duplicate
planned core work
- [x] I have run tests locally and they pass
- [x] I have added or updated tests where applicable
- [x] If this change affects the UI, I have included before/after
screenshots
- [x] I have updated relevant documentation to reflect my changes
- [x] I have considered and documented any risks above
- [x] I will address all Greptile and reviewer comments before
requesting merge
This commit is contained in:
Devin Foley
2026-04-26 17:30:20 -07:00
committed by GitHub
parent 8145141c55
commit 1d9f7a5149
@@ -346,7 +346,16 @@ describeEmbeddedPostgres("heartbeat orphaned process recovery", () => {
await new Promise((resolve) => setTimeout(resolve, 50));
}
}
await db.delete(companies);
for (let attempt = 0; attempt < 5; attempt += 1) {
await db.delete(companySkills);
try {
await db.delete(companies);
break;
} catch (error) {
if (attempt === 4) throw error;
await new Promise((resolve) => setTimeout(resolve, 50));
}
}
});
afterAll(async () => {