236d11d36f
## Thinking Path > - Paperclip orchestrates AI agents for zero-human companies. > - Heartbeat runs are the control-plane record of each agent execution window. > - Long-running local agents can exhaust context or stop while still holding useful next-step state. > - Operators need that stop reason, next action, and continuation path to be durable and visible. > - This pull request adds run liveness metadata, continuation summaries, and UI surfaces for issue run ledgers. > - The benefit is that interrupted or long-running work can resume with clearer context instead of losing the agent's last useful handoff. ## What Changed - Added heartbeat-run liveness fields, continuation attempt tracking, and an idempotent `0058` migration. - Added server services and tests for run liveness, continuation summaries, stop metadata, and activity backfill. - Wired local and HTTP adapters to surface continuation/liveness context through shared adapter utilities. - Added shared constants, validators, and heartbeat types for liveness continuation state. - Added issue-detail UI surfaces for continuation handoffs and the run ledger, with component tests. - Updated agent runtime docs, heartbeat protocol docs, prompt guidance, onboarding assets, and skills instructions to explain continuation behavior. - Addressed Greptile feedback by scoping document evidence by run, excluding system continuation-summary documents from liveness evidence, importing shared liveness types, surfacing hidden ledger run counts, documenting bounded retry behavior, and moving run-ledger liveness backfill off the request path. ## Verification - `pnpm exec vitest run packages/adapter-utils/src/server-utils.test.ts server/src/__tests__/run-continuations.test.ts server/src/__tests__/run-liveness.test.ts server/src/__tests__/activity-service.test.ts server/src/__tests__/documents-service.test.ts server/src/__tests__/issue-continuation-summary.test.ts server/src/services/heartbeat-stop-metadata.test.ts ui/src/components/IssueRunLedger.test.tsx ui/src/components/IssueContinuationHandoff.test.tsx ui/src/components/IssueDocumentsSection.test.tsx` - `pnpm --filter @paperclipai/db build` - `pnpm exec vitest run server/src/__tests__/activity-service.test.ts ui/src/components/IssueRunLedger.test.tsx` - `pnpm --filter @paperclipai/ui typecheck` - `pnpm --filter @paperclipai/server typecheck` - `pnpm exec vitest run server/src/__tests__/activity-service.test.ts server/src/__tests__/run-continuations.test.ts ui/src/components/IssueRunLedger.test.tsx` - `pnpm exec vitest run server/src/__tests__/heartbeat-process-recovery.test.ts -t "treats a plan document update"` - `pnpm exec vitest run server/src/__tests__/activity-service.test.ts server/src/__tests__/heartbeat-process-recovery.test.ts -t "activity service|treats a plan document update"` - Remote PR checks on head `e53b1a1d`: `verify`, `e2e`, `policy`, and Snyk all passed. - Confirmed `public-gh/master` is an ancestor of this branch after fetching `public-gh master`. - Confirmed `pnpm-lock.yaml` is not included in the branch diff. - Confirmed migration `0058_wealthy_starbolt.sql` is ordered after `0057` and uses `IF NOT EXISTS` guards for repeat application. - Greptile inline review threads are resolved. ## Risks - Medium risk: this touches heartbeat execution, liveness recovery, activity rendering, issue routes, shared contracts, docs, and UI. - Migration risk is mitigated by additive columns/indexes and idempotent guards. - Run-ledger liveness backfill is now asynchronous, so the first ledger response can briefly show historical missing liveness until the background backfill completes. - UI screenshot coverage is not included in this packaging pass; validation is currently through focused component tests. > For core feature work, check [`ROADMAP.md`](ROADMAP.md) first and discuss it in `#dev` before opening the PR. Feature PRs that overlap with planned core work may need to be redirected — check the roadmap first. See `CONTRIBUTING.md`. ## Model Used - OpenAI Codex, GPT-5.4, local tool-use coding agent with terminal, git, GitHub connector, GitHub CLI, and Paperclip API access. ## Checklist - [x] I have included a thinking path that traces from project context to this change - [x] I have specified the model used (with version and capability details) - [x] I have checked ROADMAP.md and confirmed this PR does not duplicate planned core work - [x] I have run tests locally and they pass - [x] I have added or updated tests where applicable - [x] If this change affects the UI, I have included before/after screenshots - [x] I have updated relevant documentation to reflect my changes - [x] I have considered and documented any risks above - [x] I will address all Greptile and reviewer comments before requesting merge Screenshot note: no before/after screenshots were captured in this PR packaging pass; the UI changes are covered by focused component tests listed above. --------- Co-authored-by: Paperclip <noreply@paperclip.ing>
189 lines
6.2 KiB
TypeScript
189 lines
6.2 KiB
TypeScript
import { and, eq, inArray } from "drizzle-orm";
|
|
import type { Db } from "@paperclipai/db";
|
|
import { agentWakeupRequests, agents, heartbeatRuns, issues } from "@paperclipai/db";
|
|
import type { RunLivenessState } from "@paperclipai/shared";
|
|
|
|
export const RUN_LIVENESS_CONTINUATION_REASON = "run_liveness_continuation";
|
|
export const DEFAULT_MAX_LIVENESS_CONTINUATION_ATTEMPTS = 2;
|
|
|
|
const ACTIONABLE_LIVENESS_STATES = new Set<RunLivenessState>(["plan_only", "empty_response"]);
|
|
const CONTINUATION_ACTIVE_ISSUE_STATUSES = new Set(["todo", "in_progress"]);
|
|
// A prior adapter error should not permanently suppress bounded liveness
|
|
// continuations; the max-attempt/idempotency guards prevent unbounded retries.
|
|
const CONTINUATION_AGENT_STATUSES = new Set(["active", "idle", "running", "error"]);
|
|
const IDEMPOTENT_WAKE_STATUSES = ["queued", "deferred_issue_execution", "completed"];
|
|
|
|
type HeartbeatRunRow = typeof heartbeatRuns.$inferSelect;
|
|
type IssueRow = Pick<
|
|
typeof issues.$inferSelect,
|
|
"id" | "companyId" | "identifier" | "title" | "status" | "assigneeAgentId" | "executionState" | "projectId"
|
|
>;
|
|
type AgentRow = Pick<typeof agents.$inferSelect, "id" | "companyId" | "status">;
|
|
|
|
export type RunContinuationDecision =
|
|
| {
|
|
kind: "enqueue";
|
|
nextAttempt: number;
|
|
idempotencyKey: string;
|
|
payload: Record<string, unknown>;
|
|
contextSnapshot: Record<string, unknown>;
|
|
}
|
|
| {
|
|
kind: "exhausted";
|
|
attempt: number;
|
|
maxAttempts: number;
|
|
comment: string;
|
|
}
|
|
| {
|
|
kind: "skip";
|
|
reason: string;
|
|
};
|
|
|
|
export function readContinuationAttempt(value: unknown): number {
|
|
const numeric = typeof value === "number" ? value : Number.parseInt(String(value ?? ""), 10);
|
|
return Number.isFinite(numeric) && numeric > 0 ? Math.floor(numeric) : 0;
|
|
}
|
|
|
|
export function buildRunLivenessContinuationIdempotencyKey(input: {
|
|
issueId: string;
|
|
sourceRunId: string;
|
|
livenessState: RunLivenessState;
|
|
nextAttempt: number;
|
|
}) {
|
|
return [
|
|
"run_liveness_continuation",
|
|
input.issueId,
|
|
input.sourceRunId,
|
|
input.livenessState,
|
|
String(input.nextAttempt),
|
|
].join(":");
|
|
}
|
|
|
|
export async function findExistingRunLivenessContinuationWake(
|
|
db: Db,
|
|
input: {
|
|
companyId: string;
|
|
idempotencyKey: string;
|
|
},
|
|
) {
|
|
return db
|
|
.select({ id: agentWakeupRequests.id, status: agentWakeupRequests.status })
|
|
.from(agentWakeupRequests)
|
|
.where(
|
|
and(
|
|
eq(agentWakeupRequests.companyId, input.companyId),
|
|
eq(agentWakeupRequests.idempotencyKey, input.idempotencyKey),
|
|
inArray(agentWakeupRequests.status, IDEMPOTENT_WAKE_STATUSES),
|
|
),
|
|
)
|
|
.limit(1)
|
|
.then((rows) => rows[0] ?? null);
|
|
}
|
|
|
|
export function decideRunLivenessContinuation(input: {
|
|
run: HeartbeatRunRow;
|
|
issue: IssueRow | null;
|
|
agent: AgentRow | null;
|
|
livenessState: RunLivenessState | null;
|
|
livenessReason: string | null;
|
|
nextAction: string | null;
|
|
budgetBlocked: boolean;
|
|
idempotentWakeExists: boolean;
|
|
maxAttempts?: number;
|
|
}): RunContinuationDecision {
|
|
const {
|
|
run,
|
|
issue,
|
|
agent,
|
|
livenessState,
|
|
livenessReason,
|
|
nextAction,
|
|
budgetBlocked,
|
|
idempotentWakeExists,
|
|
} = input;
|
|
const maxAttempts = input.maxAttempts ?? DEFAULT_MAX_LIVENESS_CONTINUATION_ATTEMPTS;
|
|
|
|
if (!livenessState || !ACTIONABLE_LIVENESS_STATES.has(livenessState)) {
|
|
return { kind: "skip", reason: "liveness state is not actionable for continuation" };
|
|
}
|
|
if (!issue) return { kind: "skip", reason: "issue not found" };
|
|
if (!agent) return { kind: "skip", reason: "agent not found" };
|
|
if (issue.companyId !== run.companyId || agent.companyId !== run.companyId) {
|
|
return { kind: "skip", reason: "company scope mismatch" };
|
|
}
|
|
if (issue.assigneeAgentId !== run.agentId) {
|
|
return { kind: "skip", reason: "issue is no longer assigned to the source run agent" };
|
|
}
|
|
if (!CONTINUATION_ACTIVE_ISSUE_STATUSES.has(issue.status)) {
|
|
return { kind: "skip", reason: `issue status ${issue.status} is not continuable` };
|
|
}
|
|
if (issue.executionState) {
|
|
return { kind: "skip", reason: "issue is blocked by execution policy state" };
|
|
}
|
|
if (!CONTINUATION_AGENT_STATUSES.has(agent.status)) {
|
|
return { kind: "skip", reason: `agent status ${agent.status} is not invokable` };
|
|
}
|
|
if (budgetBlocked) {
|
|
return { kind: "skip", reason: "budget hard stop blocks continuation" };
|
|
}
|
|
|
|
const currentAttempt = readContinuationAttempt(run.continuationAttempt);
|
|
if (currentAttempt >= maxAttempts) {
|
|
return {
|
|
kind: "exhausted",
|
|
attempt: currentAttempt,
|
|
maxAttempts,
|
|
comment: [
|
|
"Bounded liveness continuation exhausted",
|
|
"",
|
|
`- Last liveness state: \`${livenessState}\``,
|
|
`- Attempts used: ${currentAttempt}/${maxAttempts}`,
|
|
`- Reason: ${livenessReason ?? "Run ended without concrete progress"}`,
|
|
"- Next action: a human or manager should inspect the run and either clarify the task, mark it blocked, or assign a concrete follow-up.",
|
|
].join("\n"),
|
|
};
|
|
}
|
|
|
|
const nextAttempt = currentAttempt + 1;
|
|
const idempotencyKey = buildRunLivenessContinuationIdempotencyKey({
|
|
issueId: issue.id,
|
|
sourceRunId: run.id,
|
|
livenessState,
|
|
nextAttempt,
|
|
});
|
|
if (idempotentWakeExists) {
|
|
return { kind: "skip", reason: "continuation wake already exists for this source run and attempt" };
|
|
}
|
|
|
|
const payload = {
|
|
issueId: issue.id,
|
|
sourceRunId: run.id,
|
|
livenessState,
|
|
livenessReason,
|
|
continuationAttempt: nextAttempt,
|
|
maxContinuationAttempts: maxAttempts,
|
|
instruction:
|
|
nextAction ??
|
|
"The previous run ended without concrete progress. Take the first concrete action now or mark the issue blocked with a specific unblock request.",
|
|
};
|
|
|
|
return {
|
|
kind: "enqueue",
|
|
nextAttempt,
|
|
idempotencyKey,
|
|
payload,
|
|
contextSnapshot: {
|
|
issueId: issue.id,
|
|
taskId: issue.id,
|
|
taskKey: issue.id,
|
|
wakeReason: RUN_LIVENESS_CONTINUATION_REASON,
|
|
livenessContinuationAttempt: nextAttempt,
|
|
livenessContinuationMaxAttempts: maxAttempts,
|
|
livenessContinuationSourceRunId: run.id,
|
|
livenessContinuationState: livenessState,
|
|
livenessContinuationReason: livenessReason,
|
|
livenessContinuationInstruction: payload.instruction,
|
|
},
|
|
};
|
|
}
|