feat: replace k8s log API streaming with filesystem tailing #11

Merged
cpfarhood merged 3 commits from feat/filesystem-log-tail into master 2026-04-28 02:26:02 +00:00
cpfarhood commented 2026-04-28 01:06:37 +00:00 (Migrated from github.com)

Summary

Replaces K8s log API streaming (which was dropping every ~3 seconds at production scale, exhausting the 50-attempt reconnect cap within 2.5 minutes) with filesystem tailing via tee to a pod log file on the shared PVC.

Changes

  • job-manifest.ts: Added tee to claudeInvocation to write pod log file, added mkdir -p to init container to create log directory, added assertSafePathComponent and buildPodLogPath helper with path sanitization
  • execute.ts: Added tailPodLogFile function with adaptive 250ms/1s polling (speeds up to 1s after 5 consecutive empty polls), replaced k8s log streaming with tailPodLogFile in Promise.allSettled
  • config-schema.ts: Removed RTK fields (enableRtk, rtkMaxOutputBytes)
  • index.ts: Removed RTK documentation from agentConfigurationDoc
  • Deleted: log-dedup.ts, log-dedup.test.ts (RTK output truncation no longer needed)
  • job-manifest.test.ts: Updated tests for new pod log file tailing approach

Design

The pod's Claude command now tees stdout to a file on the shared PVC:

/paperclip/instances/default/run-logs/<companyId>/<agentId>/<runId>.pod.ndjson

The adapter tails that file directly instead of using the unreliable K8s log API. Path components are sanitized to [a-zA-Z0-9-] before use.

Test Status

  • Typecheck: passes
  • Tests: 344 passed, 14 pending (358 total)

The 14 pending tests are streamPodLogsOnce tests mentioned in the issue - they test the obsolete k8s log streaming approach and need to be rewritten or deleted. They use mockLogFn (k8s streaming) but the code now uses tailPodLogFile (filesystem tailing). The fs mock for node:fs/promises in vitest does not properly intercept the node: protocol prefix, making these tests difficult to update without significant rework.

Test Plan

  • Review the new tailPodLogFile function for correctness
  • Test in a dev environment with actual K8s
  • Decide how to handle the 14 pending streamPodLogsOnce tests

🤖 Generated with Claude Code

## Summary Replaces K8s log API streaming (which was dropping every ~3 seconds at production scale, exhausting the 50-attempt reconnect cap within 2.5 minutes) with filesystem tailing via `tee` to a pod log file on the shared PVC. ### Changes - **job-manifest.ts**: Added `tee` to claudeInvocation to write pod log file, added `mkdir -p` to init container to create log directory, added `assertSafePathComponent` and `buildPodLogPath` helper with path sanitization - **execute.ts**: Added `tailPodLogFile` function with adaptive 250ms/1s polling (speeds up to 1s after 5 consecutive empty polls), replaced k8s log streaming with `tailPodLogFile` in `Promise.allSettled` - **config-schema.ts**: Removed RTK fields (`enableRtk`, `rtkMaxOutputBytes`) - **index.ts**: Removed RTK documentation from agentConfigurationDoc - **Deleted**: `log-dedup.ts`, `log-dedup.test.ts` (RTK output truncation no longer needed) - **job-manifest.test.ts**: Updated tests for new pod log file tailing approach ### Design The pod's Claude command now tees stdout to a file on the shared PVC: ``` /paperclip/instances/default/run-logs/<companyId>/<agentId>/<runId>.pod.ndjson ``` The adapter tails that file directly instead of using the unreliable K8s log API. Path components are sanitized to `[a-zA-Z0-9-]` before use. ### Test Status - Typecheck: **passes** - Tests: **344 passed, 14 pending** (358 total) The 14 pending tests are `streamPodLogsOnce tests` mentioned in the issue - they test the obsolete k8s log streaming approach and need to be rewritten or deleted. They use `mockLogFn` (k8s streaming) but the code now uses `tailPodLogFile` (filesystem tailing). The fs mock for `node:fs/promises` in vitest does not properly intercept the `node:` protocol prefix, making these tests difficult to update without significant rework. ### Test Plan - [ ] Review the new `tailPodLogFile` function for correctness - [ ] Test in a dev environment with actual K8s - [ ] Decide how to handle the 14 pending `streamPodLogsOnce` tests 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Sign in to join this conversation.