Commit Graph

13 Commits

Author SHA1 Message Date
Chris Farhood 5fa9e1396e fix: poll issue status instead of heartbeat-run for cancel detection (FAR-60)
The cancel poller was calling GET /api/heartbeat-runs/{runId} which
returned 401 because the adapter key lacks access to the internal
heartbeat-runs endpoint. Switch to GET /api/issues/{issueId}, which
the adapter key can read. Also tighten the trigger condition from
status !== "running" to status === "cancelled" so that other terminal
states (done, blocked, etc.) do not abort the K8s job.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-25 12:25:41 +00:00
Chris Farhood 80d18005f9 fix: wait for concurrent job to finish instead of returning permanent blocked error (FAR-61)
When multiple tasks are assigned simultaneously, only one K8s job can run
at a time (shared PVC/session guard). Previously, all other tasks received
k8s_concurrent_run_blocked immediately and stayed blocked forever.

Now the guard retries once: wait for all blocking jobs to complete via
waitForJobCompletion, then re-check before proceeding to create a new job.
If the re-check still shows a running job, the error is returned as before.

The agentCreationMutex already serializes guard-check + job-create, so
tasks naturally queue up and execute one at a time without concurrent jobs.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-25 11:10:27 +00:00
Chris Farhood 2bd8107f1d fix: skills not bundled and resumeLastSession ignored (FAR-56, FAR-57)
Two bugs prevented skill content from reaching K8s Job prompts, and
resumeLastSession: false was silently ignored.

Skills fix (execute.ts, FAR-57):
- Add /paperclip/.claude/skills as additional candidate to
  readPaperclipRuntimeSkillEntries — the relative candidates in
  adapter-utils don't resolve to the PVC-mounted skills home
- Read entry.source/SKILL.md instead of entry.source (which is a
  directory path); fall back to source directly for file-based entries
- Mock readPaperclipRuntimeSkillEntries in execute.test.ts to prevent
  real SKILL.md reads from delaying fake-timer registration

Session fix (job-manifest.ts, FAR-56):
- Gate --session flag on asBoolean(config.resumeLastSession, true)
  so setting resumeLastSession: false actually stops session resumption
- Default true preserves existing behaviour for agents without config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 10:11:47 +00:00
Chris Farhood 0e94e84e2c fix: grace poller checks pod liveness before giving up on log stream (FAR-52)
The K8s log client v1.x closes the follow-stream prematurely due to a
known upstream bug — causing the grace timer to fire 30 s after log
stream exit even when the container is still running.  The old behaviour
(`waitForPodTermination` with a hardcoded 120 s timeout) was too short
for agents whose opencode runs take several minutes, leading to premature
failure and issues stuck in `blocked`.

Fix: the grace poller now calls `readNamespacedPod` before resolving the
completion promise.  If the pod is still Running/Pending, it resets
`logExitTime` to defer the grace deadline.  A `graceCheckPending` guard
prevents concurrent checks.  A `graceMaxWaitMs` cap (= completionTimeoutMs
when set, 20 min otherwise) ensures we never wait forever for unlimited
jobs.  Version bumped to 0.1.21.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-25 01:40:21 +00:00
Chris Farhood 2625b8ffb3 fix: wait for pod termination before readPodLogs fallback (FAR-52)
The @kubernetes/client-node v1.x Log.follow stream closes prematurely
(known upstream TODO). Combined with Node.js buffering stdout to pipes,
the live log stream always returns empty. When the 30s grace timer fires
and the stream is empty, the container may still be running.

Add waitForPodTermination() to block in the empty-stdout fallback path
until the container actually exits (up to 120s), then read its complete
output with readNamespacedPodLog. This makes runs complete successfully
instead of looping indefinitely in in_progress.

Bump version to 0.1.20.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-25 01:21:40 +00:00
Chris Farhood 2b4049464c feat: per-agent mutex, fail-closed guard, SIGTERM cleanup (FAR-40)
- Add agentCreationMutex (Map<agentId, Promise>) that serializes
  guard-check + job-create per agent, eliminating the TOCTOU race where
  two concurrent execute() calls both pass the list-then-create check.
- Change catch {} on listNamespacedJob errors to return
  errorCode: "k8s_concurrency_guard_unreachable" (fail-closed) instead
  of silently bypassing the concurrency guard.
- Add ensureSigtermHandler() which tracks active Jobs in activeJobs Map
  and deletes all of them (plus prompt Secrets) on SIGTERM before exit.
- Track orphaned-job reattaches in activeJobs for consistent cleanup.
- Update execute.test.ts: change "proceeds on list error" test to assert
  k8s_concurrency_guard_unreachable; add mutex serialization test and
  SIGTERM handler registration tests.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-25 00:22:17 +00:00
Chris Farhood c05d1d7515 feat: log stream reconnect, dedup, bail, keepalive (FAR-38)
- Split streamPodLogs into streamPodLogsOnce (with bail timer + stopSignal)
  and streamPodLogs (reconnect loop, up to MAX_LOG_RECONNECT_ATTEMPTS=50)
- LogLineDedupFilter suppresses replayed JSONL events on reconnect, keyed
  by type+sessionID+part.id (OpenCode shape)
- Bail timer (LOG_STREAM_BAIL_TIMEOUT_MS=3s) forces writable.destroy() +
  promise resolution when stopSignal fires and logApi.log hangs
- Keepalive: emits '[paperclip] keepalive — job X running (Ns since last output)'
  every 15s during silent phases, with 2-consecutive-reading latch to avoid
  false-positive terminal detections
- completionGraced uses logExitTime + grace poller so log stream stop signal
  is set immediately when job condition resolves
- All 235 tests pass, tsc clean

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-24 22:14:36 +00:00
Chris Farhood 61d2a42a66 feat: inherit valueFrom/envFrom env from Deployment; prefer paperclip container
- SelfPodInfo gains inheritedEnvValueFrom (V1EnvVar[]) and inheritedEnvFrom (V1EnvFromSource[])
- Container selection now prefers the container named "paperclip", falls back to first
- buildJobManifest appends valueFrom env vars (skipping names already overridden)
  and sets envFrom on the opencode container when present
- Tests updated: mock updated, 5 new cases covering secretKeyRef forwarding,
  dedup, envFrom passthrough, and empty-envFrom omission

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-24 22:12:31 +00:00
Chris Farhood 84dc0f5930 fix: merge parsedError + podFailureDescription so OOMKilled surfaces in errorMessage
When both a JSONL error (e.g. "killed") and a pod terminated reason (e.g. "OOMKilled")
are present, join them with "; " so the richer pod classification is never silently
dropped by the parsedError short-circuit.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-24 22:10:42 +00:00
Chris Farhood d60afaebcd feat: pod-failure classification, partial stdout fallback, llm_api_error
- Replace getPodExitCode with getPodTerminatedInfo to capture exit code
  and reason (OOMKilled, Error, etc.) from terminated container state;
  pod failure description now surfaces in returned errorMessage
- Add partial-stdout fallback: readPodLogs is triggered when stdout is
  non-empty but contains no sessionId (missing session result), not just
  when stdout is fully empty
- Detect empty LLM response: when a session ran but produced 0 output
  tokens and no messages, return errorCode "llm_api_error"
- Add 13 new unit tests covering all three new paths

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-24 22:09:33 +00:00
Chris Farhood 13c2a3032b feat: UI parser kinds, nodeSelector textarea, step-limit session clear, per-line path redaction
- ui-parser: add thinking kind + handler for standalone thinking events, thinking
  blocks in assistant content arrays, and user-turn tool_result blocks
- job-manifest: parseKeyValueOrObject helper so nodeSelector (and labels) accept
  key=value textarea lines in addition to JSON objects
- parse: isOpenCodeStepLimitResult detects step_finish with max_turns / max_steps /
  step_limit reason
- execute: return clearSession:true when step limit reached so next run starts fresh;
  redactHomePathUserSegments moved to per-line to prevent paths split across chunks
- tests: ui-parser.test.ts (new), extended parse.test.ts and job-manifest.test.ts

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-24 22:01:35 +00:00
Pawla Abdul e53bcf2501 Replace local utility stubs with fork's adapter-utils imports
- Replace joinPromptSections, stringifyPaperclipWakePayload, and
  renderPaperclipWakePrompt with imports from adapter-utils/server-utils
  (the fork's renderPaperclipWakePrompt adds execution stage routing,
  resume delta sections, and full comment batch rendering)
- Replace local inferOpenAiCompatibleBiller with import from adapter-utils
- Declare sessionManagement using getAdapterSessionManagement("opencode_local")
  with fallback defaults for proper session compaction policy
- Add log redaction via redactHomePathUserSegments in streamPodLogs
- Bump peerDependency to >=0.3.1 and version to 0.1.14

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-14 11:00:33 +00:00
Chris Farhood be7c525063 Initial commit
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 23:08:05 -04:00