forked from farhoodlabs/paperclip
Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 607c57edab |
@@ -376,9 +376,10 @@ Example:
|
|||||||
Recovery rule:
|
Recovery rule:
|
||||||
|
|
||||||
- if the latest issue-linked run failed/timed out/cancelled and no live execution path remains, Paperclip queues one automatic assignment recovery wake
|
- if the latest issue-linked run failed/timed out/cancelled and no live execution path remains, Paperclip queues one automatic assignment recovery wake
|
||||||
|
- if the issue has **no prior issue-linked run at all** — it is assigned and `todo`, no run was ever dispatched, no wake remains queued or running, and no recovery action is open — and its age exceeds the dispatch timeout (default 5 min), Paperclip treats it as **stalled-at-dispatch** and queues one automatic assignment recovery wake. Stalled-at-dispatch recovery does **not** require a prior failed, timed-out, or cancelled run; a never-dispatched assignment is a recoverable stall, not intentional rest.
|
||||||
- if that recovery wake also finishes and the issue is still stranded, Paperclip moves the issue to `blocked` and opens or updates an explicit recovery action when a bounded owner/action is known; the visible comment is evidence, not the recovery path by itself
|
- if that recovery wake also finishes and the issue is still stranded, Paperclip moves the issue to `blocked` and opens or updates an explicit recovery action when a bounded owner/action is known; the visible comment is evidence, not the recovery path by itself
|
||||||
|
|
||||||
This is a dispatch recovery, not a continuation recovery.
|
This is a dispatch recovery, not a continuation recovery. It covers both the post-crash stranded-run case and the zero-prior-run case where dispatch never produced a run.
|
||||||
|
|
||||||
### 9.2 Stranded assigned `in_progress`
|
### 9.2 Stranded assigned `in_progress`
|
||||||
|
|
||||||
@@ -410,11 +411,11 @@ On startup and on the periodic recovery loop, Paperclip now does five things in
|
|||||||
|
|
||||||
1. reap orphaned `running` runs
|
1. reap orphaned `running` runs
|
||||||
2. resume persisted `queued` runs
|
2. resume persisted `queued` runs
|
||||||
3. reconcile stranded assigned work
|
3. reconcile stranded assigned work, including assigned `todo`/`in_progress` issues that have **never produced a linked run**; stalled-at-dispatch detection does not require a prior linked run
|
||||||
4. scan silent active runs, revalidate their source issues, and either fold source-resolved watchdogs or create/update explicit watchdog recovery actions
|
4. scan silent active runs, revalidate their source issues, and either fold source-resolved watchdogs or create/update explicit watchdog recovery actions
|
||||||
5. reconcile productivity reviews
|
5. reconcile productivity reviews
|
||||||
|
|
||||||
The stranded-work pass closes the gap where issue state survives a crash but the wake/run path does not. The silent-run scan covers the separate case where a live process exists but has stopped producing observable output. The productivity-review pass is later and separate; it reviews unusual progression patterns on assigned source issues, not stale run handles after a source issue already has a valid disposition.
|
The stranded-work pass closes the gap where issue state survives a crash but the wake/run path does not. It also covers the never-dispatched case: an assigned `todo` whose dispatch never started a run, has no queued wake, and has exceeded the dispatch timeout is reconciled as stalled-at-dispatch even though no prior run exists. The silent-run scan covers the separate case where a live process exists but has stopped producing observable output. The productivity-review pass is later and separate; it reviews unusual progression patterns on assigned source issues, not stale run handles after a source issue already has a valid disposition.
|
||||||
|
|
||||||
## 11. Silent Active-Run Watchdog
|
## 11. Silent Active-Run Watchdog
|
||||||
|
|
||||||
@@ -441,6 +442,18 @@ Operators should prefer `snooze` for known time-bounded quiet periods. `continue
|
|||||||
|
|
||||||
The board can record watchdog decisions. The assigned owner of an issue-backed watchdog evaluation can also record them. Other agents cannot.
|
The board can record watchdog decisions. The assigned owner of an issue-backed watchdog evaluation can also record them. Other agents cannot.
|
||||||
|
|
||||||
|
### Adapter heartbeat staleness (pre-run)
|
||||||
|
|
||||||
|
The silent active-run watchdog above covers a run that is `running` but has stopped producing output. It does **not** cover an agent adapter that is wedged *before* any run is linked to an assigned issue. A wedged adapter can report `status: running` while its `lastHeartbeatAt` stops advancing, so dispatch triggers (assignment, @-mention, blocker-resolved wakes) fire without ever starting a run. `status: running` is therefore not, by itself, evidence of liveness — `lastHeartbeatAt` advancement is.
|
||||||
|
|
||||||
|
For every agent adapter assigned to a non-terminal issue, if `lastHeartbeatAt` has not advanced beyond a configured staleness threshold (default 15 min), Paperclip MUST, independent of whether any run is linked to the issue:
|
||||||
|
|
||||||
|
- open an explicit recovery action on the stalled issue that names the wedged adapter, the heartbeat-staleness evidence (last `lastHeartbeatAt`, staleness duration), the recovery owner, and the next action
|
||||||
|
- alert/escalate to the assignee's manager
|
||||||
|
- surface the stall visibly in activity and UI so operators can distinguish a wedged adapter from healthy idle work
|
||||||
|
|
||||||
|
This extends the watchdog contract from run-output silence to adapter-level silence that predates any linked run. Bounds mirror the active-run watchdog: at most one open adapter-staleness recovery action per adapter per staleness window, and the action folds through the normal explicit-recovery lifecycle once `lastHeartbeatAt` resumes advancing (the adapter self-recovered) or the issue otherwise reaches a valid live/waiting/terminal path.
|
||||||
|
|
||||||
### Source-aware watchdog folding
|
### Source-aware watchdog folding
|
||||||
|
|
||||||
Active-run watchdog work is source-aware. Before the watchdog creates, refreshes, escalates, or blocks on reviewer work, it must re-read the linked source issue and decide whether the watchdog signal is still about productive source work or only about stale run/process bookkeeping.
|
Active-run watchdog work is source-aware. Before the watchdog creates, refreshes, escalates, or blocks on reviewer work, it must re-read the linked source issue and decide whether the watchdog signal is still about productive source work or only about stale run/process bookkeeping.
|
||||||
|
|||||||
Reference in New Issue
Block a user