feat: add npx CLI with monorepo, CI/CD, and ephemeral worker architecture (#256)
* feat: integrate npx CLI, CI/CD, and ephemeral worker architecture
Bring in changes from shannon-npx: npx-distributable CLI package (cli/),
semantic-release CI/CD workflows, ephemeral per-scan worker containers,
TOML config support, setup wizard, and workspace management.
Preserves all shannon-only changes: security hardening (localhost-bound
ports, MCP env allowlist, path traversal guard), updated benchmarks
(XBEN 19/31/35/44), README assets, and prompt injection disclaimer.
Applies security hardening to cli/infra/compose.yml as well.
* refactor: migrate to Turborepo + pnpm + Biome monorepo
Restructure into apps/worker, apps/cli, packages/mcp-server with
Turborepo task orchestration, pnpm workspaces, Biome linting/formatting,
and tsdown CLI bundling.
Key changes:
- src/ -> apps/worker/src/, cli/ -> apps/cli/, mcp-server/ -> packages/mcp-server/
- prompts/ and configs/ moved into apps/worker/
- npm replaced with pnpm, package-lock.json replaced with pnpm-lock.yaml
- Dockerfile updated for pnpm-based builds
- CLI logs command rewritten with chokidar for cross-platform reliability
- Router health checking added for auto-detected router mode
- Centralized path resolution via apps/worker/src/paths.ts
* fix: resolve all biome warnings and formatting issues
- Remove unnecessary non-null assertions where values are guaranteed
- Replace array index access with .at() for safer element retrieval
- Use local variables to avoid repeated process.env lookups
- Replace any types with unknown in functional utilities
- Use nullish coalescing for TOTP hash byte access
- Auto-format security patches to match biome config
* fix: pin pnpm to 10.12.1 in Dockerfile for catalog support
* fix: handle Esc cancellation in Bedrock setup flow
Replace p.group() with individual prompts and per-field cancel checks,
matching the pattern used by all other provider setup flows.
* feat: add optional model customization to Anthropic setup
* fix: resolve Docker bind mount permission errors on Linux
Use entrypoint-based UID remapping instead of --user flag so the
container's pentest user matches the host UID/GID, keeping bind-mounted
volumes writable. Git config moved to --system level to survive remapping.
* fix: show resumed workflow ID in splash screen URL
When resuming a workflow, the Temporal Web UI link pointed to the old
(terminated) workflow ID. Now extracts "New Workflow ID" from the resume
header in workflow.log, falling back to the original ID for fresh scans.
* style: fix biome formatting in docker.ts
* fix: align TypeScript config types with JSON Schema
- SuccessCondition.type: use schema values (url_contains,
element_present, url_equals_exactly, text_contains) instead of
stale values (url, cookie, element, redirect)
- Authentication.login_flow: mark optional to match schema which
does not require it
* feat: mark GitHub release as latest during rollback
* fix: use native ARM64 runners for Docker multi-platform builds
Replace QEMU emulation with parallel native builds using a matrix
strategy (ubuntu-latest for amd64, ubuntu-24.04-arm for arm64).
Each platform pushes by digest, then a merge job creates the
multi-arch manifest list before signing with cosign.
* fix: resolve SessionMutex race condition with 3+ concurrent waiters
* fix: skip POSIX permission check on Windows
writeFileSync mode option is ignored on Windows, so config.toml
gets 0o666 and the guard rejects it.
* fix: resolve unsubstituted placeholders in report prompt
Remove unused {{GITHUB_URL}} placeholder and wire up {{AUTH_CONTEXT}}
with structured auth context (login type, username, URL, MFA status).
* fix: remove duplicate environment gate from merge-docker job
Move DOCKERHUB_USERNAME from vars to secrets so merge-docker can access
credentials without its own environment scope. This eliminates the
redundant double approval since build-docker already gates on
release-publish.
* fix: replace POSIX sleep binary with cross-platform async sleep
execFileSync('sleep') is unavailable on Windows. Use node:timers/promises
setTimeout instead, making ensureInfra async.
* fix: use session.json for workflow ID on resume instead of parsing workflow.log
On resume, workflow.log already exists with stale headers from the
previous run. The CLI poll found '====' immediately and extracted the
old workflow ID, producing a wrong Temporal Web UI URL.
Read the workflow ID from session.json instead — the worker writes
resume attempts there atomically. For fresh runs, poll until
originalWorkflowId appears. For resumes, poll until a new
resumeAttempts entry is appended.
* feat: add custom base URL support for Anthropic-compatible proxies
Support ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN to route SDK requests
through LiteLLM or any Anthropic-compatible proxy. Adds TUI wizard
option, TOML config mapping, credential validation, and preflight
endpoint reachability check via SDK query.
* fix: remove environment gates and add NPM_TOKEN to publish step
* feat: add beta release and rollback workflows with cosign signing
* fix: remove redundant checkout and pnpm steps from beta release workflow
* docs: normalize README commands to mode-neutral shorthand
Add a substitution note after Quick Start sections so all subsequent
examples use bare `shannon` instead of mixing `./shannon` and
`npx @keygraph/shannon`. Mode-specific commands (build, update,
uninstall) get inline annotations. Also fixes a broken command in the
Custom Base URL section.
* fix: remove redundant `update` command
Image is already auto-pulled by `ensureImage()` during `start` when the
pinned version tag is missing locally. Manual `update` was unnecessary.
* docs: add CLI package README stub
* docs: update README setup instructions for dual CLI modes
* docs: update announcement banner to npx availability
* feat: migrate from MCP tools to CLI based tools (#252)
* feat: migrate from MCP tools to CLI tools
* fix: restore browser action emoji formatters for CLI output
Adapt formatBrowserAction for playwright-cli commands, replacing the old
mcp__playwright__browser_* tool name matching removed during migration.
* fix: mount credential file to fixed container path for Vertex AI
GOOGLE_APPLICATION_CREDENTIALS was forwarded as-is to the container,
causing the relative host path to resolve against the repo mount
instead of the credentials mount. Now both local and npx modes mount
the resolved file to /app/credentials/google-sa-key.json and rewrite
the env var to match.
* feat: add git awareness and optional description field to config
* fix: drop redundant --ipc host flag from worker container
* fix: align announcement banner URL with main branch
* feat: add target URL reachability preflight check (#254)
* Moving asset benchmark graph image to this folder
* Move benchmark results to benchmark repo
Windows Defender flags exploit code in the pentest reports as false positives, forcing every Windows user to add a Defender exclusion just to clone Shannon.
* Updated README
* fix: case-insensitive grep for semantic-release version probe
* fix: harden supply chain security (#255)
* fix: patch smol-toml and tsdown vulnerabilities
Update smol-toml 1.6.0→1.6.1 (DoS via recursive comment parsing) and
tsdown 0.21.2→0.21.5 (picomatch ReDoS + method injection).
* fix: pin all unpinned dependency versions in Dockerfile
Pins subfinder v2.13.0, WhatWeb v0.6.3 (switched from git clone to
release tarball), schemathesis 4.13.0, addressable 2.8.9,
claude-code 2.1.84, and playwright-cli 0.1.1 for reproducible builds.
* fix: pin GitHub Actions to commit SHAs for supply chain security
* fix: pin GitHub Actions to commit SHAs in beta and rollback workflows
This commit is contained in:
@@ -0,0 +1,79 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Null Object pattern for audit logging - callers never check for null
|
||||
|
||||
import type { AuditSession } from '../audit/index.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
|
||||
export interface AuditLogger {
|
||||
logLlmResponse(turn: number, content: string): Promise<void>;
|
||||
logToolStart(toolName: string, parameters: unknown): Promise<void>;
|
||||
logToolEnd(result: unknown): Promise<void>;
|
||||
logError(error: Error, duration: number, turns: number): Promise<void>;
|
||||
}
|
||||
|
||||
class RealAuditLogger implements AuditLogger {
|
||||
private auditSession: AuditSession;
|
||||
|
||||
constructor(auditSession: AuditSession) {
|
||||
this.auditSession = auditSession;
|
||||
}
|
||||
|
||||
async logLlmResponse(turn: number, content: string): Promise<void> {
|
||||
await this.auditSession.logEvent('llm_response', {
|
||||
turn,
|
||||
content,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
|
||||
async logToolStart(toolName: string, parameters: unknown): Promise<void> {
|
||||
await this.auditSession.logEvent('tool_start', {
|
||||
toolName,
|
||||
parameters,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
|
||||
async logToolEnd(result: unknown): Promise<void> {
|
||||
await this.auditSession.logEvent('tool_end', {
|
||||
result,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
|
||||
async logError(error: Error, duration: number, turns: number): Promise<void> {
|
||||
await this.auditSession.logEvent('error', {
|
||||
message: error.message,
|
||||
errorType: error.constructor.name,
|
||||
stack: error.stack,
|
||||
duration,
|
||||
turns,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
/** Null Object implementation - all methods are safe no-ops */
|
||||
class NullAuditLogger implements AuditLogger {
|
||||
async logLlmResponse(_turn: number, _content: string): Promise<void> {}
|
||||
|
||||
async logToolStart(_toolName: string, _parameters: unknown): Promise<void> {}
|
||||
|
||||
async logToolEnd(_result: unknown): Promise<void> {}
|
||||
|
||||
async logError(_error: Error, _duration: number, _turns: number): Promise<void> {}
|
||||
}
|
||||
|
||||
// Returns no-op when auditSession is null
|
||||
export function createAuditLogger(auditSession: AuditSession | null): AuditLogger {
|
||||
if (auditSession) {
|
||||
return new RealAuditLogger(auditSession);
|
||||
}
|
||||
|
||||
return new NullAuditLogger();
|
||||
}
|
||||
@@ -0,0 +1,345 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Production Claude agent execution with retry, git checkpoints, and audit logging
|
||||
|
||||
import { query } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { fs, path } from 'zx';
|
||||
import type { AuditSession } from '../audit/index.js';
|
||||
import { isRetryableError, PentestError } from '../services/error-handling.js';
|
||||
import { AGENT_VALIDATORS } from '../session-manager.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { isSpendingCapBehavior } from '../utils/billing-detection.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import { Timer } from '../utils/metrics.js';
|
||||
import { createAuditLogger } from './audit-logger.js';
|
||||
import { dispatchMessage } from './message-handlers.js';
|
||||
import { type ModelTier, resolveModel } from './models.js';
|
||||
import { detectExecutionContext, formatCompletionMessage, formatErrorOutput } from './output-formatters.js';
|
||||
import { createProgressManager } from './progress-manager.js';
|
||||
import { getActualModelName } from './router-utils.js';
|
||||
|
||||
declare global {
|
||||
var SHANNON_DISABLE_LOADER: boolean | undefined;
|
||||
}
|
||||
|
||||
export interface ClaudePromptResult {
|
||||
result?: string | null | undefined;
|
||||
success: boolean;
|
||||
duration: number;
|
||||
turns?: number | undefined;
|
||||
cost: number;
|
||||
model?: string | undefined;
|
||||
partialCost?: number | undefined;
|
||||
apiErrorDetected?: boolean | undefined;
|
||||
error?: string | undefined;
|
||||
errorType?: string | undefined;
|
||||
prompt?: string | undefined;
|
||||
retryable?: boolean | undefined;
|
||||
}
|
||||
|
||||
function outputLines(lines: string[]): void {
|
||||
for (const line of lines) {
|
||||
console.log(line);
|
||||
}
|
||||
}
|
||||
|
||||
async function writeErrorLog(
|
||||
err: Error & { code?: string; status?: number },
|
||||
sourceDir: string,
|
||||
fullPrompt: string,
|
||||
duration: number,
|
||||
): Promise<void> {
|
||||
try {
|
||||
const errorLog = {
|
||||
timestamp: formatTimestamp(),
|
||||
agent: 'claude-executor',
|
||||
error: {
|
||||
name: err.constructor.name,
|
||||
message: err.message,
|
||||
code: err.code,
|
||||
status: err.status,
|
||||
stack: err.stack,
|
||||
},
|
||||
context: {
|
||||
sourceDir,
|
||||
prompt: `${fullPrompt.slice(0, 200)}...`,
|
||||
retryable: isRetryableError(err),
|
||||
},
|
||||
duration,
|
||||
};
|
||||
const logPath = path.join(sourceDir, 'error.log');
|
||||
await fs.appendFile(logPath, `${JSON.stringify(errorLog)}\n`);
|
||||
} catch {
|
||||
// Best-effort error log writing - don't propagate failures
|
||||
}
|
||||
}
|
||||
|
||||
export async function validateAgentOutput(
|
||||
result: ClaudePromptResult,
|
||||
agentName: string | null,
|
||||
sourceDir: string,
|
||||
logger: ActivityLogger,
|
||||
): Promise<boolean> {
|
||||
logger.info(`Validating ${agentName} agent output`);
|
||||
|
||||
try {
|
||||
// Check if agent completed successfully
|
||||
if (!result.success || !result.result) {
|
||||
logger.error('Validation failed: Agent execution was unsuccessful');
|
||||
return false;
|
||||
}
|
||||
|
||||
// Get validator function for this agent
|
||||
const validator = agentName ? AGENT_VALIDATORS[agentName as keyof typeof AGENT_VALIDATORS] : undefined;
|
||||
|
||||
if (!validator) {
|
||||
logger.warn(`No validator found for agent "${agentName}" - assuming success`);
|
||||
logger.info('Validation passed: Unknown agent with successful result');
|
||||
return true;
|
||||
}
|
||||
|
||||
logger.info(`Using validator for agent: ${agentName}`, { sourceDir });
|
||||
|
||||
// Apply validation function
|
||||
const validationResult = await validator(sourceDir, logger);
|
||||
|
||||
if (validationResult) {
|
||||
logger.info('Validation passed: Required files/structure present');
|
||||
} else {
|
||||
logger.error('Validation failed: Missing required deliverable files');
|
||||
}
|
||||
|
||||
return validationResult;
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
logger.error(`Validation failed with error: ${errMsg}`);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Low-level SDK execution. Handles message streaming, progress, and audit logging.
|
||||
// Exported for Temporal activities to call single-attempt execution.
|
||||
export async function runClaudePrompt(
|
||||
prompt: string,
|
||||
sourceDir: string,
|
||||
context: string = '',
|
||||
description: string = 'Claude analysis',
|
||||
_agentName: string | null = null,
|
||||
auditSession: AuditSession | null = null,
|
||||
logger: ActivityLogger,
|
||||
modelTier: ModelTier = 'medium',
|
||||
): Promise<ClaudePromptResult> {
|
||||
// 1. Initialize timing and prompt
|
||||
const timer = new Timer(`agent-${description.toLowerCase().replace(/\s+/g, '-')}`);
|
||||
const fullPrompt = context ? `${context}\n\n${prompt}` : prompt;
|
||||
|
||||
// 2. Set up progress and audit infrastructure
|
||||
const execContext = detectExecutionContext(description);
|
||||
const progress = createProgressManager(
|
||||
{ description, useCleanOutput: execContext.useCleanOutput },
|
||||
global.SHANNON_DISABLE_LOADER ?? false,
|
||||
);
|
||||
const auditLogger = createAuditLogger(auditSession);
|
||||
|
||||
logger.info(`Running Claude Code: ${description}...`);
|
||||
|
||||
// 3. Build env vars to pass to SDK subprocesses
|
||||
const sdkEnv: Record<string, string> = {
|
||||
CLAUDE_CODE_MAX_OUTPUT_TOKENS: process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS || '64000',
|
||||
};
|
||||
const passthroughVars = [
|
||||
'ANTHROPIC_API_KEY',
|
||||
'CLAUDE_CODE_OAUTH_TOKEN',
|
||||
'ANTHROPIC_BASE_URL',
|
||||
'ANTHROPIC_AUTH_TOKEN',
|
||||
'CLAUDE_CODE_USE_BEDROCK',
|
||||
'AWS_REGION',
|
||||
'AWS_BEARER_TOKEN_BEDROCK',
|
||||
'CLAUDE_CODE_USE_VERTEX',
|
||||
'CLOUD_ML_REGION',
|
||||
'ANTHROPIC_VERTEX_PROJECT_ID',
|
||||
'GOOGLE_APPLICATION_CREDENTIALS',
|
||||
'ANTHROPIC_SMALL_MODEL',
|
||||
'ANTHROPIC_MEDIUM_MODEL',
|
||||
'ANTHROPIC_LARGE_MODEL',
|
||||
'HOME',
|
||||
'PATH',
|
||||
'PLAYWRIGHT_MCP_EXECUTABLE_PATH',
|
||||
];
|
||||
for (const name of passthroughVars) {
|
||||
const val = process.env[name];
|
||||
if (val) {
|
||||
sdkEnv[name] = val;
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Configure SDK options
|
||||
const options = {
|
||||
model: resolveModel(modelTier),
|
||||
maxTurns: 10_000,
|
||||
cwd: sourceDir,
|
||||
permissionMode: 'bypassPermissions' as const,
|
||||
allowDangerouslySkipPermissions: true,
|
||||
settingSources: ['user'] as ('user' | 'project' | 'local')[],
|
||||
env: sdkEnv,
|
||||
};
|
||||
|
||||
if (!execContext.useCleanOutput) {
|
||||
logger.info(`SDK Options: maxTurns=${options.maxTurns}, cwd=${sourceDir}, permissions=BYPASS`);
|
||||
}
|
||||
|
||||
let turnCount = 0;
|
||||
let result: string | null = null;
|
||||
let apiErrorDetected = false;
|
||||
let totalCost = 0;
|
||||
|
||||
progress.start();
|
||||
|
||||
try {
|
||||
// 6. Process the message stream
|
||||
const messageLoopResult = await processMessageStream(
|
||||
fullPrompt,
|
||||
options,
|
||||
{ execContext, description, progress, auditLogger, logger },
|
||||
timer,
|
||||
);
|
||||
|
||||
turnCount = messageLoopResult.turnCount;
|
||||
result = messageLoopResult.result;
|
||||
apiErrorDetected = messageLoopResult.apiErrorDetected;
|
||||
totalCost = messageLoopResult.cost;
|
||||
const model = messageLoopResult.model;
|
||||
|
||||
// === SPENDING CAP SAFEGUARD ===
|
||||
// 7. Defense-in-depth: Detect spending cap that slipped through detectApiError().
|
||||
// Uses consolidated billing detection from utils/billing-detection.ts
|
||||
if (isSpendingCapBehavior(turnCount, totalCost, result || '')) {
|
||||
throw new PentestError(
|
||||
`Spending cap likely reached (turns=${turnCount}, cost=$0): ${result?.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // Retryable - Temporal will use 5-30 min backoff
|
||||
);
|
||||
}
|
||||
|
||||
// 8. Finalize successful result
|
||||
const duration = timer.stop();
|
||||
|
||||
if (apiErrorDetected) {
|
||||
logger.warn(`API Error detected in ${description} - will validate deliverables before failing`);
|
||||
}
|
||||
|
||||
progress.finish(formatCompletionMessage(execContext, description, turnCount, duration));
|
||||
|
||||
return {
|
||||
result,
|
||||
success: true,
|
||||
duration,
|
||||
turns: turnCount,
|
||||
cost: totalCost,
|
||||
model,
|
||||
partialCost: totalCost,
|
||||
apiErrorDetected,
|
||||
};
|
||||
} catch (error) {
|
||||
// 9. Handle errors — log, write error file, return failure
|
||||
const duration = timer.stop();
|
||||
|
||||
const err = error as Error & { code?: string; status?: number };
|
||||
|
||||
await auditLogger.logError(err, duration, turnCount);
|
||||
progress.stop();
|
||||
outputLines(formatErrorOutput(err, execContext, description, duration, sourceDir, isRetryableError(err)));
|
||||
await writeErrorLog(err, sourceDir, fullPrompt, duration);
|
||||
|
||||
return {
|
||||
error: err.message,
|
||||
errorType: err.constructor.name,
|
||||
prompt: `${fullPrompt.slice(0, 100)}...`,
|
||||
success: false,
|
||||
duration,
|
||||
cost: totalCost,
|
||||
retryable: isRetryableError(err),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
interface MessageLoopResult {
|
||||
turnCount: number;
|
||||
result: string | null;
|
||||
apiErrorDetected: boolean;
|
||||
cost: number;
|
||||
model?: string | undefined;
|
||||
}
|
||||
|
||||
interface MessageLoopDeps {
|
||||
execContext: ReturnType<typeof detectExecutionContext>;
|
||||
description: string;
|
||||
progress: ReturnType<typeof createProgressManager>;
|
||||
auditLogger: ReturnType<typeof createAuditLogger>;
|
||||
logger: ActivityLogger;
|
||||
}
|
||||
|
||||
async function processMessageStream(
|
||||
fullPrompt: string,
|
||||
options: NonNullable<Parameters<typeof query>[0]['options']>,
|
||||
deps: MessageLoopDeps,
|
||||
timer: Timer,
|
||||
): Promise<MessageLoopResult> {
|
||||
const { execContext, description, progress, auditLogger, logger } = deps;
|
||||
const HEARTBEAT_INTERVAL = 30000;
|
||||
|
||||
let turnCount = 0;
|
||||
let result: string | null = null;
|
||||
let apiErrorDetected = false;
|
||||
let cost = 0;
|
||||
let model: string | undefined;
|
||||
let lastHeartbeat = Date.now();
|
||||
|
||||
for await (const message of query({ prompt: fullPrompt, options })) {
|
||||
// Heartbeat logging when loader is disabled
|
||||
const now = Date.now();
|
||||
if (global.SHANNON_DISABLE_LOADER && now - lastHeartbeat > HEARTBEAT_INTERVAL) {
|
||||
logger.info(`[${Math.floor((now - timer.startTime) / 1000)}s] ${description} running... (Turn ${turnCount})`);
|
||||
lastHeartbeat = now;
|
||||
}
|
||||
|
||||
// Increment turn count for assistant messages
|
||||
if (message.type === 'assistant') {
|
||||
turnCount++;
|
||||
}
|
||||
|
||||
const dispatchResult = await dispatchMessage(message as { type: string; subtype?: string }, turnCount, {
|
||||
execContext,
|
||||
description,
|
||||
progress,
|
||||
auditLogger,
|
||||
logger,
|
||||
});
|
||||
|
||||
if (dispatchResult.type === 'throw') {
|
||||
throw dispatchResult.error;
|
||||
}
|
||||
|
||||
if (dispatchResult.type === 'complete') {
|
||||
result = dispatchResult.result;
|
||||
cost = dispatchResult.cost;
|
||||
break;
|
||||
}
|
||||
|
||||
if (dispatchResult.type === 'continue') {
|
||||
if (dispatchResult.apiErrorDetected) {
|
||||
apiErrorDetected = true;
|
||||
}
|
||||
// Capture model from SystemInitMessage, but override with router model if applicable
|
||||
if (dispatchResult.model) {
|
||||
model = getActualModelName(dispatchResult.model);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return { turnCount, result, apiErrorDetected, cost, model };
|
||||
}
|
||||
@@ -0,0 +1,348 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { PentestError } from '../services/error-handling.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { matchesBillingTextPattern } from '../utils/billing-detection.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import type { AuditLogger } from './audit-logger.js';
|
||||
import {
|
||||
filterJsonToolCalls,
|
||||
formatAssistantOutput,
|
||||
formatResultOutput,
|
||||
formatToolResultOutput,
|
||||
formatToolUseOutput,
|
||||
} from './output-formatters.js';
|
||||
import type { ProgressManager } from './progress-manager.js';
|
||||
import { getActualModelName } from './router-utils.js';
|
||||
import type {
|
||||
ApiErrorDetection,
|
||||
AssistantMessage,
|
||||
AssistantResult,
|
||||
ContentBlock,
|
||||
ExecutionContext,
|
||||
ResultData,
|
||||
ResultMessage,
|
||||
SystemInitMessage,
|
||||
ToolResultData,
|
||||
ToolResultMessage,
|
||||
ToolUseData,
|
||||
ToolUseMessage,
|
||||
} from './types.js';
|
||||
|
||||
// Handles both array and string content formats from SDK
|
||||
function extractMessageContent(message: AssistantMessage): string {
|
||||
const messageContent = message.message;
|
||||
|
||||
if (Array.isArray(messageContent.content)) {
|
||||
return messageContent.content.map((c: ContentBlock) => c.text || JSON.stringify(c)).join('\n');
|
||||
}
|
||||
|
||||
return String(messageContent.content);
|
||||
}
|
||||
|
||||
// Extracts only text content (no tool_use JSON) to avoid false positives in error detection
|
||||
function extractTextOnlyContent(message: AssistantMessage): string {
|
||||
const messageContent = message.message;
|
||||
|
||||
if (Array.isArray(messageContent.content)) {
|
||||
return messageContent.content
|
||||
.filter((c: ContentBlock) => c.type === 'text' || c.text)
|
||||
.map((c: ContentBlock) => c.text || '')
|
||||
.join('\n');
|
||||
}
|
||||
|
||||
return String(messageContent.content);
|
||||
}
|
||||
|
||||
function detectApiError(content: string): ApiErrorDetection {
|
||||
if (!content || typeof content !== 'string') {
|
||||
return { detected: false };
|
||||
}
|
||||
|
||||
const lowerContent = content.toLowerCase();
|
||||
|
||||
// === BILLING/SPENDING CAP ERRORS (Retryable with long backoff) ===
|
||||
// When Claude Code hits its spending cap, it returns a short message like
|
||||
// "Spending cap reached resets 8am" instead of throwing an error.
|
||||
// These should retry with 5-30 min backoff so workflows can recover when cap resets.
|
||||
if (matchesBillingTextPattern(content)) {
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Billing limit reached: ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // RETRYABLE - Temporal will use 5-30 min backoff
|
||||
{},
|
||||
ErrorCode.SPENDING_CAP_REACHED,
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
// === SESSION LIMIT (Non-retryable) ===
|
||||
// Different from spending cap - usually means something is fundamentally wrong
|
||||
if (lowerContent.includes('session limit reached')) {
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError('Session limit reached', 'billing', false),
|
||||
};
|
||||
}
|
||||
|
||||
// Non-fatal API errors - detected but continue
|
||||
if (lowerContent.includes('api error') || lowerContent.includes('terminated')) {
|
||||
return { detected: true };
|
||||
}
|
||||
|
||||
return { detected: false };
|
||||
}
|
||||
|
||||
// Maps SDK structured error types to our error handling.
|
||||
function handleStructuredError(errorType: SDKAssistantMessageError, content: string): ApiErrorDetection {
|
||||
switch (errorType) {
|
||||
case 'billing_error':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Billing error (structured): ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // Retryable with backoff
|
||||
{},
|
||||
ErrorCode.INSUFFICIENT_CREDITS,
|
||||
),
|
||||
};
|
||||
case 'rate_limit':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Rate limit hit (structured): ${content.slice(0, 100)}`,
|
||||
'network',
|
||||
true, // Retryable with backoff
|
||||
{},
|
||||
ErrorCode.API_RATE_LIMITED,
|
||||
),
|
||||
};
|
||||
case 'authentication_failed':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Authentication failed: ${content.slice(0, 100)}`,
|
||||
'config',
|
||||
false, // Not retryable - needs API key fix
|
||||
),
|
||||
};
|
||||
case 'server_error':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Server error (structured): ${content.slice(0, 100)}`,
|
||||
'network',
|
||||
true, // Retryable
|
||||
),
|
||||
};
|
||||
case 'invalid_request':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Invalid request: ${content.slice(0, 100)}`,
|
||||
'config',
|
||||
false, // Not retryable - needs code fix
|
||||
),
|
||||
};
|
||||
case 'max_output_tokens':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Max output tokens reached: ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // Retryable - may succeed with different content
|
||||
),
|
||||
};
|
||||
default:
|
||||
return { detected: true };
|
||||
}
|
||||
}
|
||||
|
||||
function handleAssistantMessage(message: AssistantMessage, turnCount: number): AssistantResult {
|
||||
const content = extractMessageContent(message);
|
||||
const cleanedContent = filterJsonToolCalls(content);
|
||||
|
||||
// Prefer structured error field from SDK, fall back to text-sniffing
|
||||
// Use text-only content for error detection to avoid false positives
|
||||
// from tool_use JSON (e.g. security reports containing "usage limit")
|
||||
let errorDetection: ApiErrorDetection;
|
||||
if (message.error) {
|
||||
errorDetection = handleStructuredError(message.error, content);
|
||||
} else {
|
||||
const textOnlyContent = extractTextOnlyContent(message);
|
||||
errorDetection = detectApiError(textOnlyContent);
|
||||
}
|
||||
|
||||
const result: AssistantResult = {
|
||||
content,
|
||||
cleanedContent,
|
||||
apiErrorDetected: errorDetection.detected,
|
||||
logData: {
|
||||
turn: turnCount,
|
||||
content,
|
||||
timestamp: formatTimestamp(),
|
||||
},
|
||||
};
|
||||
|
||||
// Only add shouldThrow if it exists (exactOptionalPropertyTypes compliance)
|
||||
if (errorDetection.shouldThrow) {
|
||||
result.shouldThrow = errorDetection.shouldThrow;
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
// Final message of a query with cost/duration info
|
||||
function handleResultMessage(message: ResultMessage): ResultData {
|
||||
const result: ResultData = {
|
||||
result: message.result || null,
|
||||
cost: message.total_cost_usd || 0,
|
||||
duration_ms: message.duration_ms || 0,
|
||||
permissionDenials: message.permission_denials?.length || 0,
|
||||
};
|
||||
|
||||
// Only add subtype if it exists (exactOptionalPropertyTypes compliance)
|
||||
if (message.subtype) {
|
||||
result.subtype = message.subtype;
|
||||
}
|
||||
|
||||
// Capture stop_reason for diagnostics (helps debug early stops, budget exceeded, etc.)
|
||||
if (message.stop_reason !== undefined) {
|
||||
result.stop_reason = message.stop_reason;
|
||||
if (message.stop_reason && message.stop_reason !== 'end_turn') {
|
||||
console.log(` Stop reason: ${message.stop_reason}`);
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
function handleToolUseMessage(message: ToolUseMessage): ToolUseData {
|
||||
return {
|
||||
toolName: message.name,
|
||||
parameters: message.input || {},
|
||||
timestamp: formatTimestamp(),
|
||||
};
|
||||
}
|
||||
|
||||
// Truncates long results for display (500 char limit), preserves full content for logging
|
||||
function handleToolResultMessage(message: ToolResultMessage): ToolResultData {
|
||||
const content = message.content;
|
||||
const contentStr = typeof content === 'string' ? content : JSON.stringify(content, null, 2);
|
||||
|
||||
const displayContent =
|
||||
contentStr.length > 500
|
||||
? `${contentStr.slice(0, 500)}...\n[Result truncated - ${contentStr.length} total chars]`
|
||||
: contentStr;
|
||||
|
||||
return {
|
||||
content,
|
||||
displayContent,
|
||||
timestamp: formatTimestamp(),
|
||||
};
|
||||
}
|
||||
|
||||
function outputLines(lines: string[]): void {
|
||||
for (const line of lines) {
|
||||
console.log(line);
|
||||
}
|
||||
}
|
||||
|
||||
export type MessageDispatchAction =
|
||||
| { type: 'continue'; apiErrorDetected?: boolean | undefined; model?: string | undefined }
|
||||
| { type: 'complete'; result: string | null; cost: number }
|
||||
| { type: 'throw'; error: Error };
|
||||
|
||||
export interface MessageDispatchDeps {
|
||||
execContext: ExecutionContext;
|
||||
description: string;
|
||||
progress: ProgressManager;
|
||||
auditLogger: AuditLogger;
|
||||
logger: ActivityLogger;
|
||||
}
|
||||
|
||||
// Dispatches SDK messages to appropriate handlers and formatters
|
||||
export async function dispatchMessage(
|
||||
message: { type: string; subtype?: string },
|
||||
turnCount: number,
|
||||
deps: MessageDispatchDeps,
|
||||
): Promise<MessageDispatchAction> {
|
||||
const { execContext, description, progress, auditLogger, logger } = deps;
|
||||
|
||||
switch (message.type) {
|
||||
case 'assistant': {
|
||||
const assistantResult = handleAssistantMessage(message as AssistantMessage, turnCount);
|
||||
|
||||
if (assistantResult.shouldThrow) {
|
||||
return { type: 'throw', error: assistantResult.shouldThrow };
|
||||
}
|
||||
|
||||
if (assistantResult.cleanedContent.trim()) {
|
||||
progress.stop();
|
||||
outputLines(formatAssistantOutput(assistantResult.cleanedContent, execContext, turnCount, description));
|
||||
progress.start();
|
||||
}
|
||||
|
||||
await auditLogger.logLlmResponse(turnCount, assistantResult.content);
|
||||
|
||||
if (assistantResult.apiErrorDetected) {
|
||||
logger.warn('API Error detected in assistant response');
|
||||
return { type: 'continue', apiErrorDetected: true };
|
||||
}
|
||||
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'system': {
|
||||
if (message.subtype === 'init') {
|
||||
const initMsg = message as SystemInitMessage;
|
||||
const actualModel = getActualModelName(initMsg.model);
|
||||
if (!execContext.useCleanOutput) {
|
||||
logger.info(`Model: ${actualModel}, Permission: ${initMsg.permissionMode}`);
|
||||
}
|
||||
// Return actual model for tracking in audit logs
|
||||
return { type: 'continue', model: actualModel };
|
||||
}
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'user':
|
||||
case 'tool_progress':
|
||||
case 'tool_use_summary':
|
||||
case 'auth_status':
|
||||
return { type: 'continue' };
|
||||
|
||||
case 'tool_use': {
|
||||
const toolData = handleToolUseMessage(message as unknown as ToolUseMessage);
|
||||
outputLines(formatToolUseOutput(toolData.toolName, toolData.parameters));
|
||||
await auditLogger.logToolStart(toolData.toolName, toolData.parameters);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'tool_result': {
|
||||
const toolResultData = handleToolResultMessage(message as unknown as ToolResultMessage);
|
||||
outputLines(formatToolResultOutput(toolResultData.displayContent));
|
||||
await auditLogger.logToolEnd(toolResultData.content);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'result': {
|
||||
const resultData = handleResultMessage(message as ResultMessage);
|
||||
outputLines(formatResultOutput(resultData, !execContext.useCleanOutput));
|
||||
return { type: 'complete', result: resultData.result, cost: resultData.cost };
|
||||
}
|
||||
|
||||
default:
|
||||
logger.info(`Unhandled message type: ${message.type}`);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,37 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Model tier definitions and resolution.
|
||||
*
|
||||
* Three tiers mapped to capability levels:
|
||||
* - "small" (Haiku — summarization, structured extraction)
|
||||
* - "medium" (Sonnet — tool use, general analysis)
|
||||
* - "large" (Opus — deep reasoning, complex analysis)
|
||||
*
|
||||
* Users override via ANTHROPIC_SMALL_MODEL / ANTHROPIC_MEDIUM_MODEL / ANTHROPIC_LARGE_MODEL,
|
||||
* which works across all providers (direct, Bedrock, Vertex).
|
||||
*/
|
||||
|
||||
export type ModelTier = 'small' | 'medium' | 'large';
|
||||
|
||||
const DEFAULT_MODELS: Readonly<Record<ModelTier, string>> = {
|
||||
small: 'claude-haiku-4-5-20251001',
|
||||
medium: 'claude-sonnet-4-6',
|
||||
large: 'claude-opus-4-6',
|
||||
};
|
||||
|
||||
/** Resolve a model tier to a concrete model ID. */
|
||||
export function resolveModel(tier: ModelTier = 'medium'): string {
|
||||
switch (tier) {
|
||||
case 'small':
|
||||
return process.env.ANTHROPIC_SMALL_MODEL || DEFAULT_MODELS.small;
|
||||
case 'large':
|
||||
return process.env.ANTHROPIC_LARGE_MODEL || DEFAULT_MODELS.large;
|
||||
default:
|
||||
return process.env.ANTHROPIC_MEDIUM_MODEL || DEFAULT_MODELS.medium;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,386 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { AGENTS } from '../session-manager.js';
|
||||
import { extractAgentType, formatDuration } from '../utils/formatting.js';
|
||||
import type { ExecutionContext, ResultData } from './types.js';
|
||||
|
||||
interface ToolCallInput {
|
||||
url?: string;
|
||||
element?: string;
|
||||
key?: string;
|
||||
fields?: unknown[];
|
||||
text?: string;
|
||||
action?: string;
|
||||
description?: string;
|
||||
command?: string;
|
||||
todos?: Array<{
|
||||
status: string;
|
||||
content: string;
|
||||
}>;
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
interface ToolCall {
|
||||
name: string;
|
||||
input?: ToolCallInput;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get agent prefix for parallel execution
|
||||
*/
|
||||
export function getAgentPrefix(description: string): string {
|
||||
// Map agent names to their prefixes
|
||||
const agentPrefixes: Record<string, string> = {
|
||||
'injection-vuln': '[Injection]',
|
||||
'xss-vuln': '[XSS]',
|
||||
'auth-vuln': '[Auth]',
|
||||
'authz-vuln': '[Authz]',
|
||||
'ssrf-vuln': '[SSRF]',
|
||||
'injection-exploit': '[Injection]',
|
||||
'xss-exploit': '[XSS]',
|
||||
'auth-exploit': '[Auth]',
|
||||
'authz-exploit': '[Authz]',
|
||||
'ssrf-exploit': '[SSRF]',
|
||||
};
|
||||
|
||||
// First try to match by agent name directly
|
||||
for (const [agentName, prefix] of Object.entries(agentPrefixes)) {
|
||||
const agent = AGENTS[agentName as keyof typeof AGENTS];
|
||||
if (agent && description.includes(agent.displayName)) {
|
||||
return prefix;
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback to partial matches for backwards compatibility
|
||||
if (description.includes('injection')) return '[Injection]';
|
||||
if (description.includes('xss')) return '[XSS]';
|
||||
if (description.includes('authz')) return '[Authz]'; // Check authz before auth
|
||||
if (description.includes('auth')) return '[Auth]';
|
||||
if (description.includes('ssrf')) return '[SSRF]';
|
||||
|
||||
return '[Agent]';
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract domain from URL for display
|
||||
*/
|
||||
function extractDomain(url: string): string {
|
||||
try {
|
||||
const urlObj = new URL(url);
|
||||
return urlObj.hostname || url.slice(0, 30);
|
||||
} catch {
|
||||
return url.slice(0, 30);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Format playwright-cli commands into clean progress indicators
|
||||
*/
|
||||
function formatBrowserAction(command: string): string | null {
|
||||
// Extract subcommand after optional session flag (e.g., "playwright-cli -s=session1 navigate https://example.com")
|
||||
const match = command.match(/playwright-cli\s+(?:-s=\S+\s+)?(\S+)(?:\s+(.*))?/);
|
||||
if (!match) return null;
|
||||
|
||||
const subcommand = match[1];
|
||||
const args = match[2] || '';
|
||||
|
||||
switch (subcommand) {
|
||||
case 'open':
|
||||
case 'goto': {
|
||||
const domain = args.trim() ? extractDomain(args.trim()) : '';
|
||||
return domain ? `🌐 Navigating to ${domain}` : '🌐 Opening browser';
|
||||
}
|
||||
case 'go-back':
|
||||
return '⬅️ Going back';
|
||||
case 'go-forward':
|
||||
return '➡️ Going forward';
|
||||
case 'reload':
|
||||
return '🔄 Reloading page';
|
||||
case 'click':
|
||||
case 'dblclick':
|
||||
return `🖱️ Clicking ${(args || 'element').slice(0, 25)}`;
|
||||
case 'hover':
|
||||
return `👆 Hovering over ${(args || 'element').slice(0, 20)}`;
|
||||
case 'type':
|
||||
return `⌨️ Typing ${(args || 'text').slice(0, 20)}`;
|
||||
case 'press':
|
||||
case 'keydown':
|
||||
case 'keyup':
|
||||
return `⌨️ Pressing ${args || 'key'}`;
|
||||
case 'fill':
|
||||
return `📝 Filling ${(args || 'field').slice(0, 25)}`;
|
||||
case 'select':
|
||||
return '📋 Selecting dropdown option';
|
||||
case 'check':
|
||||
case 'uncheck':
|
||||
return `☑️ ${subcommand === 'check' ? 'Checking' : 'Unchecking'} ${(args || 'element').slice(0, 20)}`;
|
||||
case 'upload':
|
||||
return '📁 Uploading file';
|
||||
case 'drag':
|
||||
return '🖱️ Dragging element';
|
||||
case 'snapshot':
|
||||
return '📸 Taking page snapshot';
|
||||
case 'screenshot':
|
||||
return '📸 Taking screenshot';
|
||||
case 'eval':
|
||||
case 'run-code':
|
||||
return '🔍 Running JavaScript analysis';
|
||||
case 'console':
|
||||
return '📜 Checking console logs';
|
||||
case 'network':
|
||||
return '🌐 Analyzing network traffic';
|
||||
case 'tab-list':
|
||||
case 'tab-new':
|
||||
case 'tab-close':
|
||||
case 'tab-select':
|
||||
return `🗂️ ${subcommand.replace('tab-', '')} browser tab`;
|
||||
case 'dialog-accept':
|
||||
return '💬 Accepting dialog';
|
||||
case 'dialog-dismiss':
|
||||
return '💬 Dismissing dialog';
|
||||
case 'pdf':
|
||||
return '📄 Saving page as PDF';
|
||||
case 'resize':
|
||||
return `🖥️ Resizing browser ${args || ''}`.trim();
|
||||
default:
|
||||
return `🌐 Browser: ${subcommand}`;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Summarize TodoWrite updates into clean progress indicators
|
||||
*/
|
||||
function summarizeTodoUpdate(input: ToolCallInput | undefined): string | null {
|
||||
if (!input?.todos || !Array.isArray(input.todos)) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const todos = input.todos;
|
||||
const completed = todos.filter((t) => t.status === 'completed');
|
||||
const inProgress = todos.filter((t) => t.status === 'in_progress');
|
||||
|
||||
// Show recently completed tasks
|
||||
const recent = completed.at(-1);
|
||||
if (recent) {
|
||||
return `✅ ${recent.content}`;
|
||||
}
|
||||
|
||||
// Show current in-progress task
|
||||
const current = inProgress.at(0);
|
||||
if (current) {
|
||||
return `🔄 ${current.content}`;
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Filter out JSON tool calls from content, with special handling for Task calls
|
||||
*/
|
||||
export function filterJsonToolCalls(content: string | null | undefined): string {
|
||||
if (!content || typeof content !== 'string') {
|
||||
return content || '';
|
||||
}
|
||||
|
||||
const lines = content.split('\n');
|
||||
const processedLines: string[] = [];
|
||||
|
||||
for (const line of lines) {
|
||||
const trimmed = line.trim();
|
||||
|
||||
// Skip empty lines
|
||||
if (trimmed === '') {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Check if this is a JSON tool call
|
||||
if (trimmed.startsWith('{"type":"tool_use"')) {
|
||||
try {
|
||||
const toolCall = JSON.parse(trimmed) as ToolCall;
|
||||
|
||||
// Special handling for Task tool calls
|
||||
if (toolCall.name === 'Task') {
|
||||
const description = toolCall.input?.description || 'analysis agent';
|
||||
processedLines.push(`🚀 Launching ${description}`);
|
||||
continue;
|
||||
}
|
||||
|
||||
// Special handling for TodoWrite tool calls
|
||||
if (toolCall.name === 'TodoWrite') {
|
||||
const summary = summarizeTodoUpdate(toolCall.input);
|
||||
if (summary) {
|
||||
processedLines.push(summary);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// Special handling for browser tool calls (playwright-cli via Bash)
|
||||
if (toolCall.name === 'Bash') {
|
||||
const command = toolCall.input?.command || '';
|
||||
if (command.includes('playwright-cli')) {
|
||||
const browserAction = formatBrowserAction(command);
|
||||
if (browserAction) {
|
||||
processedLines.push(browserAction);
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
// If JSON parsing fails, treat as regular text
|
||||
processedLines.push(line);
|
||||
}
|
||||
} else {
|
||||
// Keep non-JSON lines (assistant text)
|
||||
processedLines.push(line);
|
||||
}
|
||||
}
|
||||
|
||||
return processedLines.join('\n');
|
||||
}
|
||||
|
||||
export function detectExecutionContext(description: string): ExecutionContext {
|
||||
const isParallelExecution = description.includes('vuln agent') || description.includes('exploit agent');
|
||||
|
||||
const useCleanOutput =
|
||||
description.includes('Pre-recon agent') ||
|
||||
description.includes('Recon agent') ||
|
||||
description.includes('Executive Summary and Report Cleanup') ||
|
||||
description.includes('vuln agent') ||
|
||||
description.includes('exploit agent');
|
||||
|
||||
const agentType = extractAgentType(description);
|
||||
|
||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
||||
|
||||
return { isParallelExecution, useCleanOutput, agentType, agentKey };
|
||||
}
|
||||
|
||||
export function formatAssistantOutput(
|
||||
cleanedContent: string,
|
||||
context: ExecutionContext,
|
||||
turnCount: number,
|
||||
description: string,
|
||||
): string[] {
|
||||
if (!cleanedContent.trim()) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const lines: string[] = [];
|
||||
|
||||
if (context.isParallelExecution) {
|
||||
// Compact output for parallel agents with prefixes
|
||||
const prefix = getAgentPrefix(description);
|
||||
lines.push(`${prefix} ${cleanedContent}`);
|
||||
} else {
|
||||
// Full turn output for sequential agents
|
||||
lines.push(`\n Turn ${turnCount} (${description}):`);
|
||||
lines.push(` ${cleanedContent}`);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatResultOutput(data: ResultData, showFullResult: boolean): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(`\n COMPLETED:`);
|
||||
lines.push(` Duration: ${(data.duration_ms / 1000).toFixed(1)}s, Cost: $${data.cost.toFixed(4)}`);
|
||||
|
||||
if (data.subtype === 'error_max_turns') {
|
||||
lines.push(` Stopped: Hit maximum turns limit`);
|
||||
} else if (data.subtype === 'error_during_execution') {
|
||||
lines.push(` Stopped: Execution error`);
|
||||
}
|
||||
|
||||
if (data.permissionDenials > 0) {
|
||||
lines.push(` ${data.permissionDenials} permission denials`);
|
||||
}
|
||||
|
||||
if (showFullResult && data.result && typeof data.result === 'string') {
|
||||
if (data.result.length > 1000) {
|
||||
lines.push(` ${data.result.slice(0, 1000)}... [${data.result.length} total chars]`);
|
||||
} else {
|
||||
lines.push(` ${data.result}`);
|
||||
}
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatErrorOutput(
|
||||
error: Error & { code?: string; status?: number },
|
||||
context: ExecutionContext,
|
||||
description: string,
|
||||
duration: number,
|
||||
sourceDir: string,
|
||||
isRetryable: boolean,
|
||||
): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
if (context.isParallelExecution) {
|
||||
const prefix = getAgentPrefix(description);
|
||||
lines.push(`${prefix} Failed (${formatDuration(duration)})`);
|
||||
} else if (context.useCleanOutput) {
|
||||
lines.push(`${context.agentType} failed (${formatDuration(duration)})`);
|
||||
} else {
|
||||
lines.push(` Claude Code failed: ${description} (${formatDuration(duration)})`);
|
||||
}
|
||||
|
||||
lines.push(` Error Type: ${error.constructor.name}`);
|
||||
lines.push(` Message: ${error.message}`);
|
||||
lines.push(` Agent: ${description}`);
|
||||
lines.push(` Working Directory: ${sourceDir}`);
|
||||
lines.push(` Retryable: ${isRetryable ? 'Yes' : 'No'}`);
|
||||
|
||||
if (error.code) {
|
||||
lines.push(` Error Code: ${error.code}`);
|
||||
}
|
||||
if (error.status) {
|
||||
lines.push(` HTTP Status: ${error.status}`);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatCompletionMessage(
|
||||
context: ExecutionContext,
|
||||
description: string,
|
||||
turnCount: number,
|
||||
duration: number,
|
||||
): string {
|
||||
if (context.isParallelExecution) {
|
||||
const prefix = getAgentPrefix(description);
|
||||
return `${prefix} Complete (${turnCount} turns, ${formatDuration(duration)})`;
|
||||
}
|
||||
|
||||
if (context.useCleanOutput) {
|
||||
return `${context.agentType.charAt(0).toUpperCase() + context.agentType.slice(1)} complete! (${turnCount} turns, ${formatDuration(duration)})`;
|
||||
}
|
||||
|
||||
return ` Claude Code completed: ${description} (${turnCount} turns) in ${formatDuration(duration)}`;
|
||||
}
|
||||
|
||||
export function formatToolUseOutput(toolName: string, input: Record<string, unknown> | undefined): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(`\n Using Tool: ${toolName}`);
|
||||
if (input && Object.keys(input).length > 0) {
|
||||
lines.push(` Input: ${JSON.stringify(input, null, 2)}`);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatToolResultOutput(displayContent: string): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(` Tool Result:`);
|
||||
if (displayContent) {
|
||||
lines.push(` ${displayContent}`);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
@@ -0,0 +1,73 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Null Object pattern for progress indicator - callers never check for null
|
||||
|
||||
import { ProgressIndicator } from '../progress-indicator.js';
|
||||
import { extractAgentType } from '../utils/formatting.js';
|
||||
|
||||
export interface ProgressContext {
|
||||
description: string;
|
||||
useCleanOutput: boolean;
|
||||
}
|
||||
|
||||
export interface ProgressManager {
|
||||
start(): void;
|
||||
stop(): void;
|
||||
finish(message: string): void;
|
||||
isActive(): boolean;
|
||||
}
|
||||
|
||||
class RealProgressManager implements ProgressManager {
|
||||
private indicator: ProgressIndicator;
|
||||
private active: boolean = false;
|
||||
|
||||
constructor(message: string) {
|
||||
this.indicator = new ProgressIndicator(message);
|
||||
}
|
||||
|
||||
start(): void {
|
||||
this.indicator.start();
|
||||
this.active = true;
|
||||
}
|
||||
|
||||
stop(): void {
|
||||
this.indicator.stop();
|
||||
this.active = false;
|
||||
}
|
||||
|
||||
finish(message: string): void {
|
||||
this.indicator.finish(message);
|
||||
this.active = false;
|
||||
}
|
||||
|
||||
isActive(): boolean {
|
||||
return this.active;
|
||||
}
|
||||
}
|
||||
|
||||
/** Null Object implementation - all methods are safe no-ops */
|
||||
class NullProgressManager implements ProgressManager {
|
||||
start(): void {}
|
||||
|
||||
stop(): void {}
|
||||
|
||||
finish(_message: string): void {}
|
||||
|
||||
isActive(): boolean {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Returns no-op when disabled
|
||||
export function createProgressManager(context: ProgressContext, disableLoader: boolean): ProgressManager {
|
||||
if (!context.useCleanOutput || disableLoader) {
|
||||
return new NullProgressManager();
|
||||
}
|
||||
|
||||
const agentType = extractAgentType(context.description);
|
||||
return new RealProgressManager(`Running ${agentType}...`);
|
||||
}
|
||||
@@ -0,0 +1,27 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Get the actual model name being used.
|
||||
* When using claude-code-router, the SDK reports its configured model (claude-sonnet)
|
||||
* but the actual model is determined by ROUTER_DEFAULT env var.
|
||||
*/
|
||||
export function getActualModelName(sdkReportedModel?: string): string | undefined {
|
||||
const routerBaseUrl = process.env.ANTHROPIC_BASE_URL;
|
||||
const routerDefault = process.env.ROUTER_DEFAULT;
|
||||
|
||||
// If router mode is active and ROUTER_DEFAULT is set, use that
|
||||
if (routerBaseUrl && routerDefault) {
|
||||
// ROUTER_DEFAULT format: "provider,model" (e.g., "gemini,gemini-2.5-pro")
|
||||
const parts = routerDefault.split(',');
|
||||
if (parts.length >= 2) {
|
||||
return parts.slice(1).join(','); // Handle model names with commas
|
||||
}
|
||||
}
|
||||
|
||||
// Fall back to SDK-reported model
|
||||
return sdkReportedModel;
|
||||
}
|
||||
@@ -0,0 +1,99 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Type definitions for Claude executor message processing pipeline
|
||||
|
||||
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
|
||||
|
||||
export interface ExecutionContext {
|
||||
isParallelExecution: boolean;
|
||||
useCleanOutput: boolean;
|
||||
agentType: string;
|
||||
agentKey: string;
|
||||
}
|
||||
|
||||
export interface AssistantResult {
|
||||
content: string;
|
||||
cleanedContent: string;
|
||||
apiErrorDetected: boolean;
|
||||
shouldThrow?: Error;
|
||||
logData: {
|
||||
turn: number;
|
||||
content: string;
|
||||
timestamp: string;
|
||||
};
|
||||
}
|
||||
|
||||
export interface ResultData {
|
||||
result: string | null;
|
||||
cost: number;
|
||||
duration_ms: number;
|
||||
subtype?: string;
|
||||
stop_reason?: string | null;
|
||||
permissionDenials: number;
|
||||
}
|
||||
|
||||
export interface ToolUseData {
|
||||
toolName: string;
|
||||
parameters: Record<string, unknown>;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
export interface ToolResultData {
|
||||
content: unknown;
|
||||
displayContent: string;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
export interface ContentBlock {
|
||||
type?: string;
|
||||
text?: string;
|
||||
}
|
||||
|
||||
export interface AssistantMessage {
|
||||
type: 'assistant';
|
||||
error?: SDKAssistantMessageError;
|
||||
message: {
|
||||
content: ContentBlock[] | string;
|
||||
};
|
||||
}
|
||||
|
||||
export interface ResultMessage {
|
||||
type: 'result';
|
||||
result?: string;
|
||||
total_cost_usd?: number;
|
||||
duration_ms?: number;
|
||||
subtype?: string;
|
||||
stop_reason?: string | null;
|
||||
permission_denials?: unknown[];
|
||||
}
|
||||
|
||||
export interface ToolUseMessage {
|
||||
type: 'tool_use';
|
||||
name: string;
|
||||
input?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
export interface ToolResultMessage {
|
||||
type: 'tool_result';
|
||||
content?: unknown;
|
||||
}
|
||||
|
||||
export interface ApiErrorDetection {
|
||||
detected: boolean;
|
||||
shouldThrow?: Error;
|
||||
}
|
||||
|
||||
export interface SystemInitMessage {
|
||||
type: 'system';
|
||||
subtype: 'init';
|
||||
model?: string;
|
||||
permissionMode?: string;
|
||||
}
|
||||
|
||||
export interface UserMessage {
|
||||
type: 'user';
|
||||
}
|
||||
Reference in New Issue
Block a user