149 Commits

Author SHA1 Message Date
Paperclip 31328dd85b chore: unscope package name to paperclip-adapter-claude-k8s
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-21 10:26:43 +00:00
farhoodliquor-paperclip[bot] 0660749c1f Merge pull request #3 from farhoodliquor/fix/p0-correctness-far107
fix: P0+P1 correctness fixes (FAR-107 PR 1-2/3)
2026-04-20 19:41:16 +00:00
Test User b45cc29787 chore: bump version to 0.1.25 for PR #3
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-20 19:40:26 +00:00
Test User 1e517bb9bb fix: P1 correctness and operational fixes from FAR-104/FAR-105 analysis
5. Cap log stream reconnect attempts at 50 — prevents infinite
   reconnect loops during sustained API partitions.

6. Fire keepalive refresh earlier — tick 1 + every 12 ticks (~3min)
   instead of every 16 ticks (~4min), providing better safety margin
   under the 5-minute reaper window.

7. Catch rejections from onLog inside keepalive — add .catch(() => {})
   to prevent unhandledRejection on SSE backpressure.

8. Prevent sanitized-name collisions — extend slugs to 16 chars each,
   add a 6-char SHA-256 hash suffix, shorten prefix to `ac-` to stay
   well within the 63-char DNS label limit.

10. Fix config-hint parity for nodeSelector and labels — parse both
    `key=value` multiline text and JSON objects, matching what the
    textarea hint promises.

11. Large-prompt fallback via Secret — prompts >256 KiB are staged as a
    K8s Secret and mounted as a volume instead of passed via env var,
    protecting against the ~1 MiB PodSpec limit.

13. Track last-seen log timestamp on reconnect — anchor sinceSeconds at
    the last received log line instead of stream start, fixing FAR-105
    duplicative logs. Belt-and-braces: dedupe assistantTexts at the
    parser boundary in parse.ts.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-20 19:05:07 +00:00
Test User d74b6d34b3 fix: P0 correctness fixes from FAR-104/FAR-105 analysis
1. Inherit envFrom and env.valueFrom from self pod — secrets wired via
   valueFrom.secretKeyRef or envFrom.secretRef are now forwarded to Job
   pods, fixing credentials silently dropped for K8s-idiomatic secret
   patterns (e.g. ANTHROPIC_API_KEY via Secret).

2. Distinguish 404 vs transient errors in keepalive — only mark the
   keepalive as terminal on 404 (Job deleted). Transient 5xx/connection
   errors are logged and retried on the next tick, preventing premature
   reaper kills during API instability.

3. Fail closed on concurrency-guard read failure — a failing
   listNamespacedJob now returns k8s_concurrency_guard_unreachable
   instead of silently proceeding, protecting against zombie Jobs on
   shared PVCs.

4. Bound the waitForJobCompletion re-check — pass a 60s timeout instead
   of polling forever, preventing indefinite hangs when the K8s API is
   degraded.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-20 18:57:16 +00:00
Test User c35253ddd4 0.1.24 v0.1.24 2026-04-20 18:03:53 +00:00
Test User 5f358b2a26 chore: update package-lock.json
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-20 18:03:50 +00:00
Test User 5c28e6c191 fix: use printf instead of echo in init container to prevent prompt corruption
Busybox echo interprets escape sequences by default (\c, \n, \t, \0NNN, etc.).
If the prompt contains \c (common in file paths or shell references), echo
silently stops output at that point, truncating the prompt file. This can
leave Claude CLI with an empty or garbled stdin, causing it to hang with
zero output — manifesting as endless keepalive messages in the UI.

printf '%s' passes content through verbatim, avoiding the issue.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-20 18:03:37 +00:00
Test User 465a947e1d 0.1.23
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0.1.23
2026-04-20 16:10:40 +00:00
Test User ecd8bfc7f6 fix: correct Bedrock model list — add Sonnet 4.6, fix Sonnet 4.5 version
- Add missing us.anthropic.claude-sonnet-4-6 entry
- Correct sonnet version from v2:0 to v1:0 (verified against AWS docs)
- All model IDs verified against current Bedrock documentation

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-20 16:01:11 +00:00
Test User b14ec960ae 0.1.22 v0.1.22 2026-04-17 02:58:08 +00:00
Test User 5f5ae92ce7 fix: skip keepalive updatedAt refresh once K8s Job is terminal
The previous fix (df856e6) made the keepalive timer call onSpawn every
~4 minutes to refresh the run's updatedAt in the DB, so the stale-run
reaper wouldn't kill live runs in multi-instance deployments.  That was
correct for live jobs, but it was unconditional — if execute() stalled
after the pod terminated (slow K8s API call, hung log stream drain, or
a Job whose Complete condition lags pod termination), the keepalive
kept the run marked "alive" indefinitely even though the pod was gone.

That manifests as the opposite of the original bug: the UI shows jobs
as running when they have actually finished.

Two changes:

1. Verify the Job is still alive before the keepalive refreshes
   updatedAt.  If the Job has reached a terminal Complete/Failed
   condition (or has been deleted / the API read fails), stop
   refreshing.  If execute() truly ends up stuck past that point, the
   reaper will catch the run within the normal 5-minute staleness
   window instead of never.

2. Clear the keepalive interval immediately once Promise.allSettled
   resolves, rather than only in the finally block.  Post-completion
   work (exit-code fetch, log fallback read, job cleanup) must not be
   able to emit another onSpawn refresh that keeps the run "alive".

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-17 02:57:17 +00:00
Chris Farhood 20b85b8391 feat: add serviceAccountName field to config schema
Surface SA assignment in the Kubernetes section of the adapter UI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 20:06:43 -04:00
Chris Farhood 2853506a72 feat: add serviceAccountName field to config schema
Surface SA assignment in the Kubernetes section of the adapter UI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 20:04:18 -04:00
Test User ac18cc3ec3 chore: bump version to 0.1.20
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-16 21:56:44 +00:00
Test User df856e6ca5 fix: clean up orphaned K8s Jobs and refresh updatedAt to prevent UI desync
Two root causes behind the "plugin losing sync" issue:

1. After a server restart, the in-memory activeRunExecutions set is lost.
   The K8s Job keeps running but the reaper marks the server-side run as
   failed after 5 min (stale updatedAt).  Next heartbeat fires a new run,
   the adapter's concurrency guard blocks it because the old Job is still
   alive, and this loops indefinitely.

   Fix: the concurrency guard now compares each running Job's
   paperclip.io/run-id label against the current runId.  Jobs from a
   previous (dead) run are cleaned up automatically so the new run
   can proceed.

2. onLog (keepalive) does NOT update the run's updatedAt in the DB —
   it only writes to the log store and publishes SSE events.  In
   multi-instance deployments, a reaper on instance B can mark a run
   being executed on instance A as stale after 5 min of no DB updates.

   Fix: the keepalive timer now calls onSpawn every ~4 min (16 ticks)
   to refresh updatedAt, staying within the 5-min reaper threshold.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-16 21:48:16 +00:00
Test User d53559e58b fix: correct Bedrock Opus 4.7 model ID to us.anthropic.claude-opus-4-7
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-16 17:51:47 +00:00
Test User 335b7b50b5 chore: bump version to 0.1.18
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-16 17:07:08 +00:00
Test User 0b67ccc081 feat: add Opus 4.7 models and enable manual model selection
- Add claude-opus-4-7 and Bedrock Opus 4.7 to model lists
- Set models export to undefined (like opencode_k8s) to allow free-text model entry
- Move direct models list into server/models.ts
- Bump version to 0.1.17

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-16 16:57:09 +00:00
Chris Farhood 9a85842add chore: bump version for CI publish
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 11:52:02 -04:00
Test User 4bf5cf64a4 fix: call onSpawn after pod enters Running state to prevent UI desync
The k8s adapter never called ctx.onSpawn(), so the Paperclip server
had no processStartedAt timestamp for the run. The stale-run reaper
(reapOrphanedRuns) would then mark live k8s runs as failed/orphaned,
causing the UI to show no active runs and triggering duplicate run
attempts that hit the concurrency guard.

Uses pid=-1 as a sentinel since there is no local process — the
server's isProcessAlive check safely returns false for pid <= 0.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-16 15:46:34 +00:00
Chris Farhood b8ba457790 fix: don't delete job when returning state-mismatch error to keep UI in sync
When waitForJobCompletion threw and the job was still not terminal, we
were returning an error but still deleting the job in the finally block.
This left the UI holding an error while the job (still alive) would be
cleaned up by Kubernetes, causing the next heartbeat to find nothing and
think it was safe to retry — spawning a concurrent pod.

Now we set skipCleanup=true when returning the mismatch error, so the
job is retained and the heartbeat can still find and wait on it.

Also removes a duplicate empty-stdout fallback block.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 11:29:42 -04:00
Chris Farhood fa5fcb94d9 fix: remove duplicate CI/CD section from CLAUDE.md
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 07:56:25 -04:00
Chris Farhood 169636de1d docs: clarify CI/CD handles build, not local builds
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 07:27:14 -04:00
Chris Farhood efbbfbc299 fix: re-check job state when completion waiter throws to prevent UI staleness
When waitForJobCompletion threw a transient error (API disconnect, etc.),
the code fell through with jobTimedOut=true and returned a result even
though the job was still running. This caused the UI to think the run
was complete while the job kept running, resulting in concurrency errors.

Now when completion throws, we re-check the job's actual state. If still
not terminal, we return a k8s_job_state_mismatch error so the UI knows
the run is not done.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 07:26:10 -04:00
Chris Farhood 710cf37f5e chore: rebuild dist files 2026-04-15 19:03:06 -04:00
Chris Farhood 6f85a068f4 chore: bump version to trigger CI publish
Verify upstream canary has adapter-utils exports

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-15 17:59:18 -04:00
Chris Farhood 2412ee427f Merge pull request #2 from farhoodliquor/feat/adapter-plugin-capabilities
feat: declare adapter plugin capabilities on ServerAdapterModule
2026-04-15 17:47:48 -04:00
Test User ddb1ea4311 chore: update lockfile for adapter-utils canary
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-15 21:45:46 +00:00
Test User 3db5229407 feat: declare adapter plugin capabilities on ServerAdapterModule
Adds supportsInstructionsBundle, instructionsPathKey, and
requiresMaterializedRuntimeSkills flags so the UI renders the
bundle editor for claude_k8s agents. Bumps adapter-utils peer
dep to the canary that includes the capability type fields.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-15 21:45:08 +00:00
Pawla Abdul 389bbb6f99 Add ServerAdapterModule capabilities from fork's adapter-utils 0.3.1
Bring the K8s adapter up to parity with the fork's ServerAdapterModule
contract by adding sessionManagement, listSkills/syncSkills, listModels
with Bedrock detection, and promptBundleKey support in the session codec.

- Declare sessionManagement with nativeContextManagement: "confirmed"
  so Paperclip skips threshold-based session compaction (Claude manages
  its own context)
- Add ephemeral skill management (listSkills/syncSkills) mirroring
  claude_local — reports skill state without runtime persistence since
  skills are injected via prompt bundle into ephemeral Job pods
- Add listModels() with Bedrock environment detection, returning
  region-qualified model IDs when CLAUDE_CODE_USE_BEDROCK or
  ANTHROPIC_BEDROCK_BASE_URL are set
- Extend session codec to round-trip promptBundleKey field
- Remove the `as ServerAdapterModule` cast — the return type now
  satisfies the full interface

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-14 11:13:18 +00:00
Pawla Abdul 10a5004c02 Revert "Add RTK integration for token-optimized command output"
This reverts commit d074cb2a8c.
2026-04-14 01:35:16 +00:00
Pawla Abdul d074cb2a8c Add RTK integration for token-optimized command output
When enableRtk is set in adapter config, the adapter:
- Adds an init container (curlimages/curl) to download the RTK binary
- Mounts RTK binary in the main container via shared emptyDir volume
- Runs `rtk install claude-code` before invoking Claude to set up hooks
- Disables RTK telemetry (RTK_NO_TELEMETRY=1) for automated environments
- Supports optional rtkVersion config for pinning specific versions

RTK filters CLI command output before it reaches the LLM context,
reducing token consumption by ~80%.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-14 01:27:28 +00:00
Pawla Abdul 77ba40d9bf Reconnect K8s log stream on silent API disconnects
The adapter opened a single follow-stream to the K8s API for pod logs.
If that TCP connection silently dropped (API server hiccup, network
timeout, load-balancer idle cut), streamPodLogs returned early and no
more real Claude output reached the UI — only keepalive pings.  The
pod kept running and producing logs (visible via kubectl), but the
adapter never reconnected.

Splits streamPodLogs into streamPodLogsOnce (single follow attempt) and
a reconnecting wrapper that retries with sinceSeconds until a shared
stop signal fires when waitForJobCompletion resolves.  On reconnect,
requests logs from the original stream start time (+5s overlap) so no
output is lost; the UI deduplicates chunks.

Bumps version to 0.1.12.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-13 10:34:41 +00:00
Pawla Abdul e760bf9386 Add keepalive pings during job execution to prevent UI timeout desync
The adapter had no mechanism to signal liveness while a K8s Job was
running. When Claude entered long thinking phases with no log output,
the Paperclip UI could lose sync and consider the run stuck even though
the pod was still actively working.

Adds a 15-second interval keepalive that sends status messages via
onLog during execution. The keepalive tracks time since last real log
output and reports it, keeping the connection alive. The timer is
cleaned up in the finally block to prevent leaks on any exit path.

Bumps version to 0.1.11.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-12 18:44:09 +00:00
Pawla Abdul ac2fe20294 Restore maxTurnsPerRun field, add config schema tests
Keep adapter-specific maxTurnsPerRun (default 1000) in the config
schema since the platform UI does not provide it for external adapters.
Platform-provided fields (model, effort, instructionsFilePath,
timeoutSec, graceSec) remain excluded to avoid duplication.

Add config-schema.test.ts with assertions that platform-provided
fields are absent and adapter-specific fields have correct defaults.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-12 18:32:49 +00:00
Pawla Abdul c8d883d409 Remove duplicate/internal fields from UI config schema
Fields like model, reasoning effort, instructions file path, max turns,
timeout, and grace period are either surfaced elsewhere in the platform
UI or are internal operational settings that shouldn't be user-facing
in the adapter config panel. These values remain functional when set
via the API/backend — only the UI exposure is removed.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-12 17:24:31 +00:00
Chris Farhood df5cb84dca Rename project to 'Claude (Kubernetes) Paperclip Adapter' 2026-04-12 11:22:43 -04:00
Pawla Abdul c8c5e01371 Enhance README with RWX PVC requirements, RBAC examples, and full config docs
Adds detailed prerequisites section covering ReadWriteMany PVC setup,
complete RBAC Role/RoleBinding/ServiceAccount manifests, and API key
secret configuration. Includes full configuration reference tables and
a How It Works section explaining the adapter lifecycle.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-12 15:20:54 +00:00
Pawla Abdul e75a62b329 Fix CI publish failures and add missing config schema fields
- CI publish job failed because it tried to re-publish existing versions
  (npm returns 404 for scoped packages on duplicate version). Added a
  version-exists check before npm publish to skip gracefully.
- Also fixed the auth env var from NPM_TOKEN to NODE_AUTH_TOKEN which
  is what actions/setup-node's registry-url option expects.
- Added missing core and operational fields to getConfigSchema() so the
  Paperclip UI surfaces model, effort, maxTurnsPerRun, skipPermissions,
  instructionsFilePath, timeoutSec, and graceSec alongside existing K8s
  infrastructure fields.
- Bumped version to 0.1.10.

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-04-12 15:02:54 +00:00
Chris Farhood 545950daf2 Add CLI formatter, fix env forwarding, rename job prefix to agent-claude-
- Add src/cli/ with format-event.ts (printClaudeStreamEvent) exported from
  CLIAdapterModule
- Fix env var forwarding: read from pod spec container env dynamically instead
  of static allowlist; agent config env overrides pod values
- Rename K8s Job prefix from agent- to agent-claude-
- Add fsGroupChangePolicy: "OnRootMismatch" to skip PVC chown on subsequent runs
- Add comprehensive test coverage (159 tests across 5 test files)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 10:47:27 -04:00
Chris Farhood 514fe15009 Regenerate package-lock.json to sync with package.json
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 10:44:52 -04:00
Chris Farhood f5fa41fb3a Fix getConfigSchema to use flat fields array with correct hint keys
The Paperclip AdapterConfigSchema type expects a flat fields array, not
nested sections. Also maps description -> hint per the schema type.
Defines types locally since @paperclipai/adapter-utils@0.3.1 on npm
does not yet export AdapterConfigSchema/ConfigFieldSchema (those exist
in the monorepo but aren't released to npm yet).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 10:43:31 -04:00
Chris Farhood 448889fc94 Add GitHub Actions CI workflow
Runs typecheck and tests on push/PR to master, then publishes to npm
on successful master pushes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 10:39:23 -04:00
Chris Farhood 75ba66e504 Add getConfigSchema to surface K8s fields in Paperclip UI
Adds AdapterConfigSchema with three sections (Kubernetes, Resource Limits,
Scheduling) exposing: namespace, image, imagePullPolicy, kubeconfig,
resources.{requests,limits}.{cpu,memory}, nodeSelector, tolerations,
labels, ttlSecondsAfterFinished, retainJobs.

Paperclip's server fetches GET /api/adapters/:type/config-schema and
caches the result, automatically assigning ConfigFields to external
adapters. The adapter now wires getConfigSchema into createServerAdapter().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 10:31:55 -04:00
Chris Farhood 98af28a272 Skip PVC chown on subsequent runs with fsGroupChangePolicy
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-12 00:04:35 -04:00
Chris Farhood 4b0baaf05c Add .gitignore and bump version for npm publishing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 23:43:19 -04:00
Chris Farhood 4c310d020d Clarify that Paperclip must be deployed on an RWX PVC 2026-04-11 23:18:16 -04:00
Chris Farhood 9dbb5f337e Initial commit: Paperclip adapter for Claude Code on Kubernetes
Adapter plugin that runs Claude Code agents as Kubernetes Jobs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-11 23:16:31 -04:00