feat: add npx CLI with monorepo, CI/CD, and ephemeral worker architecture (#256)
* feat: integrate npx CLI, CI/CD, and ephemeral worker architecture
Bring in changes from shannon-npx: npx-distributable CLI package (cli/),
semantic-release CI/CD workflows, ephemeral per-scan worker containers,
TOML config support, setup wizard, and workspace management.
Preserves all shannon-only changes: security hardening (localhost-bound
ports, MCP env allowlist, path traversal guard), updated benchmarks
(XBEN 19/31/35/44), README assets, and prompt injection disclaimer.
Applies security hardening to cli/infra/compose.yml as well.
* refactor: migrate to Turborepo + pnpm + Biome monorepo
Restructure into apps/worker, apps/cli, packages/mcp-server with
Turborepo task orchestration, pnpm workspaces, Biome linting/formatting,
and tsdown CLI bundling.
Key changes:
- src/ -> apps/worker/src/, cli/ -> apps/cli/, mcp-server/ -> packages/mcp-server/
- prompts/ and configs/ moved into apps/worker/
- npm replaced with pnpm, package-lock.json replaced with pnpm-lock.yaml
- Dockerfile updated for pnpm-based builds
- CLI logs command rewritten with chokidar for cross-platform reliability
- Router health checking added for auto-detected router mode
- Centralized path resolution via apps/worker/src/paths.ts
* fix: resolve all biome warnings and formatting issues
- Remove unnecessary non-null assertions where values are guaranteed
- Replace array index access with .at() for safer element retrieval
- Use local variables to avoid repeated process.env lookups
- Replace any types with unknown in functional utilities
- Use nullish coalescing for TOTP hash byte access
- Auto-format security patches to match biome config
* fix: pin pnpm to 10.12.1 in Dockerfile for catalog support
* fix: handle Esc cancellation in Bedrock setup flow
Replace p.group() with individual prompts and per-field cancel checks,
matching the pattern used by all other provider setup flows.
* feat: add optional model customization to Anthropic setup
* fix: resolve Docker bind mount permission errors on Linux
Use entrypoint-based UID remapping instead of --user flag so the
container's pentest user matches the host UID/GID, keeping bind-mounted
volumes writable. Git config moved to --system level to survive remapping.
* fix: show resumed workflow ID in splash screen URL
When resuming a workflow, the Temporal Web UI link pointed to the old
(terminated) workflow ID. Now extracts "New Workflow ID" from the resume
header in workflow.log, falling back to the original ID for fresh scans.
* style: fix biome formatting in docker.ts
* fix: align TypeScript config types with JSON Schema
- SuccessCondition.type: use schema values (url_contains,
element_present, url_equals_exactly, text_contains) instead of
stale values (url, cookie, element, redirect)
- Authentication.login_flow: mark optional to match schema which
does not require it
* feat: mark GitHub release as latest during rollback
* fix: use native ARM64 runners for Docker multi-platform builds
Replace QEMU emulation with parallel native builds using a matrix
strategy (ubuntu-latest for amd64, ubuntu-24.04-arm for arm64).
Each platform pushes by digest, then a merge job creates the
multi-arch manifest list before signing with cosign.
* fix: resolve SessionMutex race condition with 3+ concurrent waiters
* fix: skip POSIX permission check on Windows
writeFileSync mode option is ignored on Windows, so config.toml
gets 0o666 and the guard rejects it.
* fix: resolve unsubstituted placeholders in report prompt
Remove unused {{GITHUB_URL}} placeholder and wire up {{AUTH_CONTEXT}}
with structured auth context (login type, username, URL, MFA status).
* fix: remove duplicate environment gate from merge-docker job
Move DOCKERHUB_USERNAME from vars to secrets so merge-docker can access
credentials without its own environment scope. This eliminates the
redundant double approval since build-docker already gates on
release-publish.
* fix: replace POSIX sleep binary with cross-platform async sleep
execFileSync('sleep') is unavailable on Windows. Use node:timers/promises
setTimeout instead, making ensureInfra async.
* fix: use session.json for workflow ID on resume instead of parsing workflow.log
On resume, workflow.log already exists with stale headers from the
previous run. The CLI poll found '====' immediately and extracted the
old workflow ID, producing a wrong Temporal Web UI URL.
Read the workflow ID from session.json instead — the worker writes
resume attempts there atomically. For fresh runs, poll until
originalWorkflowId appears. For resumes, poll until a new
resumeAttempts entry is appended.
* feat: add custom base URL support for Anthropic-compatible proxies
Support ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN to route SDK requests
through LiteLLM or any Anthropic-compatible proxy. Adds TUI wizard
option, TOML config mapping, credential validation, and preflight
endpoint reachability check via SDK query.
* fix: remove environment gates and add NPM_TOKEN to publish step
* feat: add beta release and rollback workflows with cosign signing
* fix: remove redundant checkout and pnpm steps from beta release workflow
* docs: normalize README commands to mode-neutral shorthand
Add a substitution note after Quick Start sections so all subsequent
examples use bare `shannon` instead of mixing `./shannon` and
`npx @keygraph/shannon`. Mode-specific commands (build, update,
uninstall) get inline annotations. Also fixes a broken command in the
Custom Base URL section.
* fix: remove redundant `update` command
Image is already auto-pulled by `ensureImage()` during `start` when the
pinned version tag is missing locally. Manual `update` was unnecessary.
* docs: add CLI package README stub
* docs: update README setup instructions for dual CLI modes
* docs: update announcement banner to npx availability
* feat: migrate from MCP tools to CLI based tools (#252)
* feat: migrate from MCP tools to CLI tools
* fix: restore browser action emoji formatters for CLI output
Adapt formatBrowserAction for playwright-cli commands, replacing the old
mcp__playwright__browser_* tool name matching removed during migration.
* fix: mount credential file to fixed container path for Vertex AI
GOOGLE_APPLICATION_CREDENTIALS was forwarded as-is to the container,
causing the relative host path to resolve against the repo mount
instead of the credentials mount. Now both local and npx modes mount
the resolved file to /app/credentials/google-sa-key.json and rewrite
the env var to match.
* feat: add git awareness and optional description field to config
* fix: drop redundant --ipc host flag from worker container
* fix: align announcement banner URL with main branch
* feat: add target URL reachability preflight check (#254)
* Moving asset benchmark graph image to this folder
* Move benchmark results to benchmark repo
Windows Defender flags exploit code in the pentest reports as false positives, forcing every Windows user to add a Defender exclusion just to clone Shannon.
* Updated README
* fix: case-insensitive grep for semantic-release version probe
* fix: harden supply chain security (#255)
* fix: patch smol-toml and tsdown vulnerabilities
Update smol-toml 1.6.0→1.6.1 (DoS via recursive comment parsing) and
tsdown 0.21.2→0.21.5 (picomatch ReDoS + method injection).
* fix: pin all unpinned dependency versions in Dockerfile
Pins subfinder v2.13.0, WhatWeb v0.6.3 (switched from git clone to
release tarball), schemathesis 4.13.0, addressable 2.8.9,
claude-code 2.1.84, and playwright-cli 0.1.1 for reproducible builds.
* fix: pin GitHub Actions to commit SHAs for supply chain security
* fix: pin GitHub Actions to commit SHAs in beta and rollback workflows
This commit is contained in:
@@ -0,0 +1,79 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Null Object pattern for audit logging - callers never check for null
|
||||
|
||||
import type { AuditSession } from '../audit/index.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
|
||||
export interface AuditLogger {
|
||||
logLlmResponse(turn: number, content: string): Promise<void>;
|
||||
logToolStart(toolName: string, parameters: unknown): Promise<void>;
|
||||
logToolEnd(result: unknown): Promise<void>;
|
||||
logError(error: Error, duration: number, turns: number): Promise<void>;
|
||||
}
|
||||
|
||||
class RealAuditLogger implements AuditLogger {
|
||||
private auditSession: AuditSession;
|
||||
|
||||
constructor(auditSession: AuditSession) {
|
||||
this.auditSession = auditSession;
|
||||
}
|
||||
|
||||
async logLlmResponse(turn: number, content: string): Promise<void> {
|
||||
await this.auditSession.logEvent('llm_response', {
|
||||
turn,
|
||||
content,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
|
||||
async logToolStart(toolName: string, parameters: unknown): Promise<void> {
|
||||
await this.auditSession.logEvent('tool_start', {
|
||||
toolName,
|
||||
parameters,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
|
||||
async logToolEnd(result: unknown): Promise<void> {
|
||||
await this.auditSession.logEvent('tool_end', {
|
||||
result,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
|
||||
async logError(error: Error, duration: number, turns: number): Promise<void> {
|
||||
await this.auditSession.logEvent('error', {
|
||||
message: error.message,
|
||||
errorType: error.constructor.name,
|
||||
stack: error.stack,
|
||||
duration,
|
||||
turns,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
/** Null Object implementation - all methods are safe no-ops */
|
||||
class NullAuditLogger implements AuditLogger {
|
||||
async logLlmResponse(_turn: number, _content: string): Promise<void> {}
|
||||
|
||||
async logToolStart(_toolName: string, _parameters: unknown): Promise<void> {}
|
||||
|
||||
async logToolEnd(_result: unknown): Promise<void> {}
|
||||
|
||||
async logError(_error: Error, _duration: number, _turns: number): Promise<void> {}
|
||||
}
|
||||
|
||||
// Returns no-op when auditSession is null
|
||||
export function createAuditLogger(auditSession: AuditSession | null): AuditLogger {
|
||||
if (auditSession) {
|
||||
return new RealAuditLogger(auditSession);
|
||||
}
|
||||
|
||||
return new NullAuditLogger();
|
||||
}
|
||||
@@ -0,0 +1,345 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Production Claude agent execution with retry, git checkpoints, and audit logging
|
||||
|
||||
import { query } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { fs, path } from 'zx';
|
||||
import type { AuditSession } from '../audit/index.js';
|
||||
import { isRetryableError, PentestError } from '../services/error-handling.js';
|
||||
import { AGENT_VALIDATORS } from '../session-manager.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { isSpendingCapBehavior } from '../utils/billing-detection.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import { Timer } from '../utils/metrics.js';
|
||||
import { createAuditLogger } from './audit-logger.js';
|
||||
import { dispatchMessage } from './message-handlers.js';
|
||||
import { type ModelTier, resolveModel } from './models.js';
|
||||
import { detectExecutionContext, formatCompletionMessage, formatErrorOutput } from './output-formatters.js';
|
||||
import { createProgressManager } from './progress-manager.js';
|
||||
import { getActualModelName } from './router-utils.js';
|
||||
|
||||
declare global {
|
||||
var SHANNON_DISABLE_LOADER: boolean | undefined;
|
||||
}
|
||||
|
||||
export interface ClaudePromptResult {
|
||||
result?: string | null | undefined;
|
||||
success: boolean;
|
||||
duration: number;
|
||||
turns?: number | undefined;
|
||||
cost: number;
|
||||
model?: string | undefined;
|
||||
partialCost?: number | undefined;
|
||||
apiErrorDetected?: boolean | undefined;
|
||||
error?: string | undefined;
|
||||
errorType?: string | undefined;
|
||||
prompt?: string | undefined;
|
||||
retryable?: boolean | undefined;
|
||||
}
|
||||
|
||||
function outputLines(lines: string[]): void {
|
||||
for (const line of lines) {
|
||||
console.log(line);
|
||||
}
|
||||
}
|
||||
|
||||
async function writeErrorLog(
|
||||
err: Error & { code?: string; status?: number },
|
||||
sourceDir: string,
|
||||
fullPrompt: string,
|
||||
duration: number,
|
||||
): Promise<void> {
|
||||
try {
|
||||
const errorLog = {
|
||||
timestamp: formatTimestamp(),
|
||||
agent: 'claude-executor',
|
||||
error: {
|
||||
name: err.constructor.name,
|
||||
message: err.message,
|
||||
code: err.code,
|
||||
status: err.status,
|
||||
stack: err.stack,
|
||||
},
|
||||
context: {
|
||||
sourceDir,
|
||||
prompt: `${fullPrompt.slice(0, 200)}...`,
|
||||
retryable: isRetryableError(err),
|
||||
},
|
||||
duration,
|
||||
};
|
||||
const logPath = path.join(sourceDir, 'error.log');
|
||||
await fs.appendFile(logPath, `${JSON.stringify(errorLog)}\n`);
|
||||
} catch {
|
||||
// Best-effort error log writing - don't propagate failures
|
||||
}
|
||||
}
|
||||
|
||||
export async function validateAgentOutput(
|
||||
result: ClaudePromptResult,
|
||||
agentName: string | null,
|
||||
sourceDir: string,
|
||||
logger: ActivityLogger,
|
||||
): Promise<boolean> {
|
||||
logger.info(`Validating ${agentName} agent output`);
|
||||
|
||||
try {
|
||||
// Check if agent completed successfully
|
||||
if (!result.success || !result.result) {
|
||||
logger.error('Validation failed: Agent execution was unsuccessful');
|
||||
return false;
|
||||
}
|
||||
|
||||
// Get validator function for this agent
|
||||
const validator = agentName ? AGENT_VALIDATORS[agentName as keyof typeof AGENT_VALIDATORS] : undefined;
|
||||
|
||||
if (!validator) {
|
||||
logger.warn(`No validator found for agent "${agentName}" - assuming success`);
|
||||
logger.info('Validation passed: Unknown agent with successful result');
|
||||
return true;
|
||||
}
|
||||
|
||||
logger.info(`Using validator for agent: ${agentName}`, { sourceDir });
|
||||
|
||||
// Apply validation function
|
||||
const validationResult = await validator(sourceDir, logger);
|
||||
|
||||
if (validationResult) {
|
||||
logger.info('Validation passed: Required files/structure present');
|
||||
} else {
|
||||
logger.error('Validation failed: Missing required deliverable files');
|
||||
}
|
||||
|
||||
return validationResult;
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
logger.error(`Validation failed with error: ${errMsg}`);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Low-level SDK execution. Handles message streaming, progress, and audit logging.
|
||||
// Exported for Temporal activities to call single-attempt execution.
|
||||
export async function runClaudePrompt(
|
||||
prompt: string,
|
||||
sourceDir: string,
|
||||
context: string = '',
|
||||
description: string = 'Claude analysis',
|
||||
_agentName: string | null = null,
|
||||
auditSession: AuditSession | null = null,
|
||||
logger: ActivityLogger,
|
||||
modelTier: ModelTier = 'medium',
|
||||
): Promise<ClaudePromptResult> {
|
||||
// 1. Initialize timing and prompt
|
||||
const timer = new Timer(`agent-${description.toLowerCase().replace(/\s+/g, '-')}`);
|
||||
const fullPrompt = context ? `${context}\n\n${prompt}` : prompt;
|
||||
|
||||
// 2. Set up progress and audit infrastructure
|
||||
const execContext = detectExecutionContext(description);
|
||||
const progress = createProgressManager(
|
||||
{ description, useCleanOutput: execContext.useCleanOutput },
|
||||
global.SHANNON_DISABLE_LOADER ?? false,
|
||||
);
|
||||
const auditLogger = createAuditLogger(auditSession);
|
||||
|
||||
logger.info(`Running Claude Code: ${description}...`);
|
||||
|
||||
// 3. Build env vars to pass to SDK subprocesses
|
||||
const sdkEnv: Record<string, string> = {
|
||||
CLAUDE_CODE_MAX_OUTPUT_TOKENS: process.env.CLAUDE_CODE_MAX_OUTPUT_TOKENS || '64000',
|
||||
};
|
||||
const passthroughVars = [
|
||||
'ANTHROPIC_API_KEY',
|
||||
'CLAUDE_CODE_OAUTH_TOKEN',
|
||||
'ANTHROPIC_BASE_URL',
|
||||
'ANTHROPIC_AUTH_TOKEN',
|
||||
'CLAUDE_CODE_USE_BEDROCK',
|
||||
'AWS_REGION',
|
||||
'AWS_BEARER_TOKEN_BEDROCK',
|
||||
'CLAUDE_CODE_USE_VERTEX',
|
||||
'CLOUD_ML_REGION',
|
||||
'ANTHROPIC_VERTEX_PROJECT_ID',
|
||||
'GOOGLE_APPLICATION_CREDENTIALS',
|
||||
'ANTHROPIC_SMALL_MODEL',
|
||||
'ANTHROPIC_MEDIUM_MODEL',
|
||||
'ANTHROPIC_LARGE_MODEL',
|
||||
'HOME',
|
||||
'PATH',
|
||||
'PLAYWRIGHT_MCP_EXECUTABLE_PATH',
|
||||
];
|
||||
for (const name of passthroughVars) {
|
||||
const val = process.env[name];
|
||||
if (val) {
|
||||
sdkEnv[name] = val;
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Configure SDK options
|
||||
const options = {
|
||||
model: resolveModel(modelTier),
|
||||
maxTurns: 10_000,
|
||||
cwd: sourceDir,
|
||||
permissionMode: 'bypassPermissions' as const,
|
||||
allowDangerouslySkipPermissions: true,
|
||||
settingSources: ['user'] as ('user' | 'project' | 'local')[],
|
||||
env: sdkEnv,
|
||||
};
|
||||
|
||||
if (!execContext.useCleanOutput) {
|
||||
logger.info(`SDK Options: maxTurns=${options.maxTurns}, cwd=${sourceDir}, permissions=BYPASS`);
|
||||
}
|
||||
|
||||
let turnCount = 0;
|
||||
let result: string | null = null;
|
||||
let apiErrorDetected = false;
|
||||
let totalCost = 0;
|
||||
|
||||
progress.start();
|
||||
|
||||
try {
|
||||
// 6. Process the message stream
|
||||
const messageLoopResult = await processMessageStream(
|
||||
fullPrompt,
|
||||
options,
|
||||
{ execContext, description, progress, auditLogger, logger },
|
||||
timer,
|
||||
);
|
||||
|
||||
turnCount = messageLoopResult.turnCount;
|
||||
result = messageLoopResult.result;
|
||||
apiErrorDetected = messageLoopResult.apiErrorDetected;
|
||||
totalCost = messageLoopResult.cost;
|
||||
const model = messageLoopResult.model;
|
||||
|
||||
// === SPENDING CAP SAFEGUARD ===
|
||||
// 7. Defense-in-depth: Detect spending cap that slipped through detectApiError().
|
||||
// Uses consolidated billing detection from utils/billing-detection.ts
|
||||
if (isSpendingCapBehavior(turnCount, totalCost, result || '')) {
|
||||
throw new PentestError(
|
||||
`Spending cap likely reached (turns=${turnCount}, cost=$0): ${result?.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // Retryable - Temporal will use 5-30 min backoff
|
||||
);
|
||||
}
|
||||
|
||||
// 8. Finalize successful result
|
||||
const duration = timer.stop();
|
||||
|
||||
if (apiErrorDetected) {
|
||||
logger.warn(`API Error detected in ${description} - will validate deliverables before failing`);
|
||||
}
|
||||
|
||||
progress.finish(formatCompletionMessage(execContext, description, turnCount, duration));
|
||||
|
||||
return {
|
||||
result,
|
||||
success: true,
|
||||
duration,
|
||||
turns: turnCount,
|
||||
cost: totalCost,
|
||||
model,
|
||||
partialCost: totalCost,
|
||||
apiErrorDetected,
|
||||
};
|
||||
} catch (error) {
|
||||
// 9. Handle errors — log, write error file, return failure
|
||||
const duration = timer.stop();
|
||||
|
||||
const err = error as Error & { code?: string; status?: number };
|
||||
|
||||
await auditLogger.logError(err, duration, turnCount);
|
||||
progress.stop();
|
||||
outputLines(formatErrorOutput(err, execContext, description, duration, sourceDir, isRetryableError(err)));
|
||||
await writeErrorLog(err, sourceDir, fullPrompt, duration);
|
||||
|
||||
return {
|
||||
error: err.message,
|
||||
errorType: err.constructor.name,
|
||||
prompt: `${fullPrompt.slice(0, 100)}...`,
|
||||
success: false,
|
||||
duration,
|
||||
cost: totalCost,
|
||||
retryable: isRetryableError(err),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
interface MessageLoopResult {
|
||||
turnCount: number;
|
||||
result: string | null;
|
||||
apiErrorDetected: boolean;
|
||||
cost: number;
|
||||
model?: string | undefined;
|
||||
}
|
||||
|
||||
interface MessageLoopDeps {
|
||||
execContext: ReturnType<typeof detectExecutionContext>;
|
||||
description: string;
|
||||
progress: ReturnType<typeof createProgressManager>;
|
||||
auditLogger: ReturnType<typeof createAuditLogger>;
|
||||
logger: ActivityLogger;
|
||||
}
|
||||
|
||||
async function processMessageStream(
|
||||
fullPrompt: string,
|
||||
options: NonNullable<Parameters<typeof query>[0]['options']>,
|
||||
deps: MessageLoopDeps,
|
||||
timer: Timer,
|
||||
): Promise<MessageLoopResult> {
|
||||
const { execContext, description, progress, auditLogger, logger } = deps;
|
||||
const HEARTBEAT_INTERVAL = 30000;
|
||||
|
||||
let turnCount = 0;
|
||||
let result: string | null = null;
|
||||
let apiErrorDetected = false;
|
||||
let cost = 0;
|
||||
let model: string | undefined;
|
||||
let lastHeartbeat = Date.now();
|
||||
|
||||
for await (const message of query({ prompt: fullPrompt, options })) {
|
||||
// Heartbeat logging when loader is disabled
|
||||
const now = Date.now();
|
||||
if (global.SHANNON_DISABLE_LOADER && now - lastHeartbeat > HEARTBEAT_INTERVAL) {
|
||||
logger.info(`[${Math.floor((now - timer.startTime) / 1000)}s] ${description} running... (Turn ${turnCount})`);
|
||||
lastHeartbeat = now;
|
||||
}
|
||||
|
||||
// Increment turn count for assistant messages
|
||||
if (message.type === 'assistant') {
|
||||
turnCount++;
|
||||
}
|
||||
|
||||
const dispatchResult = await dispatchMessage(message as { type: string; subtype?: string }, turnCount, {
|
||||
execContext,
|
||||
description,
|
||||
progress,
|
||||
auditLogger,
|
||||
logger,
|
||||
});
|
||||
|
||||
if (dispatchResult.type === 'throw') {
|
||||
throw dispatchResult.error;
|
||||
}
|
||||
|
||||
if (dispatchResult.type === 'complete') {
|
||||
result = dispatchResult.result;
|
||||
cost = dispatchResult.cost;
|
||||
break;
|
||||
}
|
||||
|
||||
if (dispatchResult.type === 'continue') {
|
||||
if (dispatchResult.apiErrorDetected) {
|
||||
apiErrorDetected = true;
|
||||
}
|
||||
// Capture model from SystemInitMessage, but override with router model if applicable
|
||||
if (dispatchResult.model) {
|
||||
model = getActualModelName(dispatchResult.model);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return { turnCount, result, apiErrorDetected, cost, model };
|
||||
}
|
||||
@@ -0,0 +1,348 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { PentestError } from '../services/error-handling.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { matchesBillingTextPattern } from '../utils/billing-detection.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import type { AuditLogger } from './audit-logger.js';
|
||||
import {
|
||||
filterJsonToolCalls,
|
||||
formatAssistantOutput,
|
||||
formatResultOutput,
|
||||
formatToolResultOutput,
|
||||
formatToolUseOutput,
|
||||
} from './output-formatters.js';
|
||||
import type { ProgressManager } from './progress-manager.js';
|
||||
import { getActualModelName } from './router-utils.js';
|
||||
import type {
|
||||
ApiErrorDetection,
|
||||
AssistantMessage,
|
||||
AssistantResult,
|
||||
ContentBlock,
|
||||
ExecutionContext,
|
||||
ResultData,
|
||||
ResultMessage,
|
||||
SystemInitMessage,
|
||||
ToolResultData,
|
||||
ToolResultMessage,
|
||||
ToolUseData,
|
||||
ToolUseMessage,
|
||||
} from './types.js';
|
||||
|
||||
// Handles both array and string content formats from SDK
|
||||
function extractMessageContent(message: AssistantMessage): string {
|
||||
const messageContent = message.message;
|
||||
|
||||
if (Array.isArray(messageContent.content)) {
|
||||
return messageContent.content.map((c: ContentBlock) => c.text || JSON.stringify(c)).join('\n');
|
||||
}
|
||||
|
||||
return String(messageContent.content);
|
||||
}
|
||||
|
||||
// Extracts only text content (no tool_use JSON) to avoid false positives in error detection
|
||||
function extractTextOnlyContent(message: AssistantMessage): string {
|
||||
const messageContent = message.message;
|
||||
|
||||
if (Array.isArray(messageContent.content)) {
|
||||
return messageContent.content
|
||||
.filter((c: ContentBlock) => c.type === 'text' || c.text)
|
||||
.map((c: ContentBlock) => c.text || '')
|
||||
.join('\n');
|
||||
}
|
||||
|
||||
return String(messageContent.content);
|
||||
}
|
||||
|
||||
function detectApiError(content: string): ApiErrorDetection {
|
||||
if (!content || typeof content !== 'string') {
|
||||
return { detected: false };
|
||||
}
|
||||
|
||||
const lowerContent = content.toLowerCase();
|
||||
|
||||
// === BILLING/SPENDING CAP ERRORS (Retryable with long backoff) ===
|
||||
// When Claude Code hits its spending cap, it returns a short message like
|
||||
// "Spending cap reached resets 8am" instead of throwing an error.
|
||||
// These should retry with 5-30 min backoff so workflows can recover when cap resets.
|
||||
if (matchesBillingTextPattern(content)) {
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Billing limit reached: ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // RETRYABLE - Temporal will use 5-30 min backoff
|
||||
{},
|
||||
ErrorCode.SPENDING_CAP_REACHED,
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
// === SESSION LIMIT (Non-retryable) ===
|
||||
// Different from spending cap - usually means something is fundamentally wrong
|
||||
if (lowerContent.includes('session limit reached')) {
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError('Session limit reached', 'billing', false),
|
||||
};
|
||||
}
|
||||
|
||||
// Non-fatal API errors - detected but continue
|
||||
if (lowerContent.includes('api error') || lowerContent.includes('terminated')) {
|
||||
return { detected: true };
|
||||
}
|
||||
|
||||
return { detected: false };
|
||||
}
|
||||
|
||||
// Maps SDK structured error types to our error handling.
|
||||
function handleStructuredError(errorType: SDKAssistantMessageError, content: string): ApiErrorDetection {
|
||||
switch (errorType) {
|
||||
case 'billing_error':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Billing error (structured): ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // Retryable with backoff
|
||||
{},
|
||||
ErrorCode.INSUFFICIENT_CREDITS,
|
||||
),
|
||||
};
|
||||
case 'rate_limit':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Rate limit hit (structured): ${content.slice(0, 100)}`,
|
||||
'network',
|
||||
true, // Retryable with backoff
|
||||
{},
|
||||
ErrorCode.API_RATE_LIMITED,
|
||||
),
|
||||
};
|
||||
case 'authentication_failed':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Authentication failed: ${content.slice(0, 100)}`,
|
||||
'config',
|
||||
false, // Not retryable - needs API key fix
|
||||
),
|
||||
};
|
||||
case 'server_error':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Server error (structured): ${content.slice(0, 100)}`,
|
||||
'network',
|
||||
true, // Retryable
|
||||
),
|
||||
};
|
||||
case 'invalid_request':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Invalid request: ${content.slice(0, 100)}`,
|
||||
'config',
|
||||
false, // Not retryable - needs code fix
|
||||
),
|
||||
};
|
||||
case 'max_output_tokens':
|
||||
return {
|
||||
detected: true,
|
||||
shouldThrow: new PentestError(
|
||||
`Max output tokens reached: ${content.slice(0, 100)}`,
|
||||
'billing',
|
||||
true, // Retryable - may succeed with different content
|
||||
),
|
||||
};
|
||||
default:
|
||||
return { detected: true };
|
||||
}
|
||||
}
|
||||
|
||||
function handleAssistantMessage(message: AssistantMessage, turnCount: number): AssistantResult {
|
||||
const content = extractMessageContent(message);
|
||||
const cleanedContent = filterJsonToolCalls(content);
|
||||
|
||||
// Prefer structured error field from SDK, fall back to text-sniffing
|
||||
// Use text-only content for error detection to avoid false positives
|
||||
// from tool_use JSON (e.g. security reports containing "usage limit")
|
||||
let errorDetection: ApiErrorDetection;
|
||||
if (message.error) {
|
||||
errorDetection = handleStructuredError(message.error, content);
|
||||
} else {
|
||||
const textOnlyContent = extractTextOnlyContent(message);
|
||||
errorDetection = detectApiError(textOnlyContent);
|
||||
}
|
||||
|
||||
const result: AssistantResult = {
|
||||
content,
|
||||
cleanedContent,
|
||||
apiErrorDetected: errorDetection.detected,
|
||||
logData: {
|
||||
turn: turnCount,
|
||||
content,
|
||||
timestamp: formatTimestamp(),
|
||||
},
|
||||
};
|
||||
|
||||
// Only add shouldThrow if it exists (exactOptionalPropertyTypes compliance)
|
||||
if (errorDetection.shouldThrow) {
|
||||
result.shouldThrow = errorDetection.shouldThrow;
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
// Final message of a query with cost/duration info
|
||||
function handleResultMessage(message: ResultMessage): ResultData {
|
||||
const result: ResultData = {
|
||||
result: message.result || null,
|
||||
cost: message.total_cost_usd || 0,
|
||||
duration_ms: message.duration_ms || 0,
|
||||
permissionDenials: message.permission_denials?.length || 0,
|
||||
};
|
||||
|
||||
// Only add subtype if it exists (exactOptionalPropertyTypes compliance)
|
||||
if (message.subtype) {
|
||||
result.subtype = message.subtype;
|
||||
}
|
||||
|
||||
// Capture stop_reason for diagnostics (helps debug early stops, budget exceeded, etc.)
|
||||
if (message.stop_reason !== undefined) {
|
||||
result.stop_reason = message.stop_reason;
|
||||
if (message.stop_reason && message.stop_reason !== 'end_turn') {
|
||||
console.log(` Stop reason: ${message.stop_reason}`);
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
function handleToolUseMessage(message: ToolUseMessage): ToolUseData {
|
||||
return {
|
||||
toolName: message.name,
|
||||
parameters: message.input || {},
|
||||
timestamp: formatTimestamp(),
|
||||
};
|
||||
}
|
||||
|
||||
// Truncates long results for display (500 char limit), preserves full content for logging
|
||||
function handleToolResultMessage(message: ToolResultMessage): ToolResultData {
|
||||
const content = message.content;
|
||||
const contentStr = typeof content === 'string' ? content : JSON.stringify(content, null, 2);
|
||||
|
||||
const displayContent =
|
||||
contentStr.length > 500
|
||||
? `${contentStr.slice(0, 500)}...\n[Result truncated - ${contentStr.length} total chars]`
|
||||
: contentStr;
|
||||
|
||||
return {
|
||||
content,
|
||||
displayContent,
|
||||
timestamp: formatTimestamp(),
|
||||
};
|
||||
}
|
||||
|
||||
function outputLines(lines: string[]): void {
|
||||
for (const line of lines) {
|
||||
console.log(line);
|
||||
}
|
||||
}
|
||||
|
||||
export type MessageDispatchAction =
|
||||
| { type: 'continue'; apiErrorDetected?: boolean | undefined; model?: string | undefined }
|
||||
| { type: 'complete'; result: string | null; cost: number }
|
||||
| { type: 'throw'; error: Error };
|
||||
|
||||
export interface MessageDispatchDeps {
|
||||
execContext: ExecutionContext;
|
||||
description: string;
|
||||
progress: ProgressManager;
|
||||
auditLogger: AuditLogger;
|
||||
logger: ActivityLogger;
|
||||
}
|
||||
|
||||
// Dispatches SDK messages to appropriate handlers and formatters
|
||||
export async function dispatchMessage(
|
||||
message: { type: string; subtype?: string },
|
||||
turnCount: number,
|
||||
deps: MessageDispatchDeps,
|
||||
): Promise<MessageDispatchAction> {
|
||||
const { execContext, description, progress, auditLogger, logger } = deps;
|
||||
|
||||
switch (message.type) {
|
||||
case 'assistant': {
|
||||
const assistantResult = handleAssistantMessage(message as AssistantMessage, turnCount);
|
||||
|
||||
if (assistantResult.shouldThrow) {
|
||||
return { type: 'throw', error: assistantResult.shouldThrow };
|
||||
}
|
||||
|
||||
if (assistantResult.cleanedContent.trim()) {
|
||||
progress.stop();
|
||||
outputLines(formatAssistantOutput(assistantResult.cleanedContent, execContext, turnCount, description));
|
||||
progress.start();
|
||||
}
|
||||
|
||||
await auditLogger.logLlmResponse(turnCount, assistantResult.content);
|
||||
|
||||
if (assistantResult.apiErrorDetected) {
|
||||
logger.warn('API Error detected in assistant response');
|
||||
return { type: 'continue', apiErrorDetected: true };
|
||||
}
|
||||
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'system': {
|
||||
if (message.subtype === 'init') {
|
||||
const initMsg = message as SystemInitMessage;
|
||||
const actualModel = getActualModelName(initMsg.model);
|
||||
if (!execContext.useCleanOutput) {
|
||||
logger.info(`Model: ${actualModel}, Permission: ${initMsg.permissionMode}`);
|
||||
}
|
||||
// Return actual model for tracking in audit logs
|
||||
return { type: 'continue', model: actualModel };
|
||||
}
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'user':
|
||||
case 'tool_progress':
|
||||
case 'tool_use_summary':
|
||||
case 'auth_status':
|
||||
return { type: 'continue' };
|
||||
|
||||
case 'tool_use': {
|
||||
const toolData = handleToolUseMessage(message as unknown as ToolUseMessage);
|
||||
outputLines(formatToolUseOutput(toolData.toolName, toolData.parameters));
|
||||
await auditLogger.logToolStart(toolData.toolName, toolData.parameters);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'tool_result': {
|
||||
const toolResultData = handleToolResultMessage(message as unknown as ToolResultMessage);
|
||||
outputLines(formatToolResultOutput(toolResultData.displayContent));
|
||||
await auditLogger.logToolEnd(toolResultData.content);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
|
||||
case 'result': {
|
||||
const resultData = handleResultMessage(message as ResultMessage);
|
||||
outputLines(formatResultOutput(resultData, !execContext.useCleanOutput));
|
||||
return { type: 'complete', result: resultData.result, cost: resultData.cost };
|
||||
}
|
||||
|
||||
default:
|
||||
logger.info(`Unhandled message type: ${message.type}`);
|
||||
return { type: 'continue' };
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,37 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Model tier definitions and resolution.
|
||||
*
|
||||
* Three tiers mapped to capability levels:
|
||||
* - "small" (Haiku — summarization, structured extraction)
|
||||
* - "medium" (Sonnet — tool use, general analysis)
|
||||
* - "large" (Opus — deep reasoning, complex analysis)
|
||||
*
|
||||
* Users override via ANTHROPIC_SMALL_MODEL / ANTHROPIC_MEDIUM_MODEL / ANTHROPIC_LARGE_MODEL,
|
||||
* which works across all providers (direct, Bedrock, Vertex).
|
||||
*/
|
||||
|
||||
export type ModelTier = 'small' | 'medium' | 'large';
|
||||
|
||||
const DEFAULT_MODELS: Readonly<Record<ModelTier, string>> = {
|
||||
small: 'claude-haiku-4-5-20251001',
|
||||
medium: 'claude-sonnet-4-6',
|
||||
large: 'claude-opus-4-6',
|
||||
};
|
||||
|
||||
/** Resolve a model tier to a concrete model ID. */
|
||||
export function resolveModel(tier: ModelTier = 'medium'): string {
|
||||
switch (tier) {
|
||||
case 'small':
|
||||
return process.env.ANTHROPIC_SMALL_MODEL || DEFAULT_MODELS.small;
|
||||
case 'large':
|
||||
return process.env.ANTHROPIC_LARGE_MODEL || DEFAULT_MODELS.large;
|
||||
default:
|
||||
return process.env.ANTHROPIC_MEDIUM_MODEL || DEFAULT_MODELS.medium;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,386 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { AGENTS } from '../session-manager.js';
|
||||
import { extractAgentType, formatDuration } from '../utils/formatting.js';
|
||||
import type { ExecutionContext, ResultData } from './types.js';
|
||||
|
||||
interface ToolCallInput {
|
||||
url?: string;
|
||||
element?: string;
|
||||
key?: string;
|
||||
fields?: unknown[];
|
||||
text?: string;
|
||||
action?: string;
|
||||
description?: string;
|
||||
command?: string;
|
||||
todos?: Array<{
|
||||
status: string;
|
||||
content: string;
|
||||
}>;
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
interface ToolCall {
|
||||
name: string;
|
||||
input?: ToolCallInput;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get agent prefix for parallel execution
|
||||
*/
|
||||
export function getAgentPrefix(description: string): string {
|
||||
// Map agent names to their prefixes
|
||||
const agentPrefixes: Record<string, string> = {
|
||||
'injection-vuln': '[Injection]',
|
||||
'xss-vuln': '[XSS]',
|
||||
'auth-vuln': '[Auth]',
|
||||
'authz-vuln': '[Authz]',
|
||||
'ssrf-vuln': '[SSRF]',
|
||||
'injection-exploit': '[Injection]',
|
||||
'xss-exploit': '[XSS]',
|
||||
'auth-exploit': '[Auth]',
|
||||
'authz-exploit': '[Authz]',
|
||||
'ssrf-exploit': '[SSRF]',
|
||||
};
|
||||
|
||||
// First try to match by agent name directly
|
||||
for (const [agentName, prefix] of Object.entries(agentPrefixes)) {
|
||||
const agent = AGENTS[agentName as keyof typeof AGENTS];
|
||||
if (agent && description.includes(agent.displayName)) {
|
||||
return prefix;
|
||||
}
|
||||
}
|
||||
|
||||
// Fallback to partial matches for backwards compatibility
|
||||
if (description.includes('injection')) return '[Injection]';
|
||||
if (description.includes('xss')) return '[XSS]';
|
||||
if (description.includes('authz')) return '[Authz]'; // Check authz before auth
|
||||
if (description.includes('auth')) return '[Auth]';
|
||||
if (description.includes('ssrf')) return '[SSRF]';
|
||||
|
||||
return '[Agent]';
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract domain from URL for display
|
||||
*/
|
||||
function extractDomain(url: string): string {
|
||||
try {
|
||||
const urlObj = new URL(url);
|
||||
return urlObj.hostname || url.slice(0, 30);
|
||||
} catch {
|
||||
return url.slice(0, 30);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Format playwright-cli commands into clean progress indicators
|
||||
*/
|
||||
function formatBrowserAction(command: string): string | null {
|
||||
// Extract subcommand after optional session flag (e.g., "playwright-cli -s=session1 navigate https://example.com")
|
||||
const match = command.match(/playwright-cli\s+(?:-s=\S+\s+)?(\S+)(?:\s+(.*))?/);
|
||||
if (!match) return null;
|
||||
|
||||
const subcommand = match[1];
|
||||
const args = match[2] || '';
|
||||
|
||||
switch (subcommand) {
|
||||
case 'open':
|
||||
case 'goto': {
|
||||
const domain = args.trim() ? extractDomain(args.trim()) : '';
|
||||
return domain ? `🌐 Navigating to ${domain}` : '🌐 Opening browser';
|
||||
}
|
||||
case 'go-back':
|
||||
return '⬅️ Going back';
|
||||
case 'go-forward':
|
||||
return '➡️ Going forward';
|
||||
case 'reload':
|
||||
return '🔄 Reloading page';
|
||||
case 'click':
|
||||
case 'dblclick':
|
||||
return `🖱️ Clicking ${(args || 'element').slice(0, 25)}`;
|
||||
case 'hover':
|
||||
return `👆 Hovering over ${(args || 'element').slice(0, 20)}`;
|
||||
case 'type':
|
||||
return `⌨️ Typing ${(args || 'text').slice(0, 20)}`;
|
||||
case 'press':
|
||||
case 'keydown':
|
||||
case 'keyup':
|
||||
return `⌨️ Pressing ${args || 'key'}`;
|
||||
case 'fill':
|
||||
return `📝 Filling ${(args || 'field').slice(0, 25)}`;
|
||||
case 'select':
|
||||
return '📋 Selecting dropdown option';
|
||||
case 'check':
|
||||
case 'uncheck':
|
||||
return `☑️ ${subcommand === 'check' ? 'Checking' : 'Unchecking'} ${(args || 'element').slice(0, 20)}`;
|
||||
case 'upload':
|
||||
return '📁 Uploading file';
|
||||
case 'drag':
|
||||
return '🖱️ Dragging element';
|
||||
case 'snapshot':
|
||||
return '📸 Taking page snapshot';
|
||||
case 'screenshot':
|
||||
return '📸 Taking screenshot';
|
||||
case 'eval':
|
||||
case 'run-code':
|
||||
return '🔍 Running JavaScript analysis';
|
||||
case 'console':
|
||||
return '📜 Checking console logs';
|
||||
case 'network':
|
||||
return '🌐 Analyzing network traffic';
|
||||
case 'tab-list':
|
||||
case 'tab-new':
|
||||
case 'tab-close':
|
||||
case 'tab-select':
|
||||
return `🗂️ ${subcommand.replace('tab-', '')} browser tab`;
|
||||
case 'dialog-accept':
|
||||
return '💬 Accepting dialog';
|
||||
case 'dialog-dismiss':
|
||||
return '💬 Dismissing dialog';
|
||||
case 'pdf':
|
||||
return '📄 Saving page as PDF';
|
||||
case 'resize':
|
||||
return `🖥️ Resizing browser ${args || ''}`.trim();
|
||||
default:
|
||||
return `🌐 Browser: ${subcommand}`;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Summarize TodoWrite updates into clean progress indicators
|
||||
*/
|
||||
function summarizeTodoUpdate(input: ToolCallInput | undefined): string | null {
|
||||
if (!input?.todos || !Array.isArray(input.todos)) {
|
||||
return null;
|
||||
}
|
||||
|
||||
const todos = input.todos;
|
||||
const completed = todos.filter((t) => t.status === 'completed');
|
||||
const inProgress = todos.filter((t) => t.status === 'in_progress');
|
||||
|
||||
// Show recently completed tasks
|
||||
const recent = completed.at(-1);
|
||||
if (recent) {
|
||||
return `✅ ${recent.content}`;
|
||||
}
|
||||
|
||||
// Show current in-progress task
|
||||
const current = inProgress.at(0);
|
||||
if (current) {
|
||||
return `🔄 ${current.content}`;
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Filter out JSON tool calls from content, with special handling for Task calls
|
||||
*/
|
||||
export function filterJsonToolCalls(content: string | null | undefined): string {
|
||||
if (!content || typeof content !== 'string') {
|
||||
return content || '';
|
||||
}
|
||||
|
||||
const lines = content.split('\n');
|
||||
const processedLines: string[] = [];
|
||||
|
||||
for (const line of lines) {
|
||||
const trimmed = line.trim();
|
||||
|
||||
// Skip empty lines
|
||||
if (trimmed === '') {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Check if this is a JSON tool call
|
||||
if (trimmed.startsWith('{"type":"tool_use"')) {
|
||||
try {
|
||||
const toolCall = JSON.parse(trimmed) as ToolCall;
|
||||
|
||||
// Special handling for Task tool calls
|
||||
if (toolCall.name === 'Task') {
|
||||
const description = toolCall.input?.description || 'analysis agent';
|
||||
processedLines.push(`🚀 Launching ${description}`);
|
||||
continue;
|
||||
}
|
||||
|
||||
// Special handling for TodoWrite tool calls
|
||||
if (toolCall.name === 'TodoWrite') {
|
||||
const summary = summarizeTodoUpdate(toolCall.input);
|
||||
if (summary) {
|
||||
processedLines.push(summary);
|
||||
}
|
||||
continue;
|
||||
}
|
||||
|
||||
// Special handling for browser tool calls (playwright-cli via Bash)
|
||||
if (toolCall.name === 'Bash') {
|
||||
const command = toolCall.input?.command || '';
|
||||
if (command.includes('playwright-cli')) {
|
||||
const browserAction = formatBrowserAction(command);
|
||||
if (browserAction) {
|
||||
processedLines.push(browserAction);
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch {
|
||||
// If JSON parsing fails, treat as regular text
|
||||
processedLines.push(line);
|
||||
}
|
||||
} else {
|
||||
// Keep non-JSON lines (assistant text)
|
||||
processedLines.push(line);
|
||||
}
|
||||
}
|
||||
|
||||
return processedLines.join('\n');
|
||||
}
|
||||
|
||||
export function detectExecutionContext(description: string): ExecutionContext {
|
||||
const isParallelExecution = description.includes('vuln agent') || description.includes('exploit agent');
|
||||
|
||||
const useCleanOutput =
|
||||
description.includes('Pre-recon agent') ||
|
||||
description.includes('Recon agent') ||
|
||||
description.includes('Executive Summary and Report Cleanup') ||
|
||||
description.includes('vuln agent') ||
|
||||
description.includes('exploit agent');
|
||||
|
||||
const agentType = extractAgentType(description);
|
||||
|
||||
const agentKey = description.toLowerCase().replace(/\s+/g, '-');
|
||||
|
||||
return { isParallelExecution, useCleanOutput, agentType, agentKey };
|
||||
}
|
||||
|
||||
export function formatAssistantOutput(
|
||||
cleanedContent: string,
|
||||
context: ExecutionContext,
|
||||
turnCount: number,
|
||||
description: string,
|
||||
): string[] {
|
||||
if (!cleanedContent.trim()) {
|
||||
return [];
|
||||
}
|
||||
|
||||
const lines: string[] = [];
|
||||
|
||||
if (context.isParallelExecution) {
|
||||
// Compact output for parallel agents with prefixes
|
||||
const prefix = getAgentPrefix(description);
|
||||
lines.push(`${prefix} ${cleanedContent}`);
|
||||
} else {
|
||||
// Full turn output for sequential agents
|
||||
lines.push(`\n Turn ${turnCount} (${description}):`);
|
||||
lines.push(` ${cleanedContent}`);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatResultOutput(data: ResultData, showFullResult: boolean): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(`\n COMPLETED:`);
|
||||
lines.push(` Duration: ${(data.duration_ms / 1000).toFixed(1)}s, Cost: $${data.cost.toFixed(4)}`);
|
||||
|
||||
if (data.subtype === 'error_max_turns') {
|
||||
lines.push(` Stopped: Hit maximum turns limit`);
|
||||
} else if (data.subtype === 'error_during_execution') {
|
||||
lines.push(` Stopped: Execution error`);
|
||||
}
|
||||
|
||||
if (data.permissionDenials > 0) {
|
||||
lines.push(` ${data.permissionDenials} permission denials`);
|
||||
}
|
||||
|
||||
if (showFullResult && data.result && typeof data.result === 'string') {
|
||||
if (data.result.length > 1000) {
|
||||
lines.push(` ${data.result.slice(0, 1000)}... [${data.result.length} total chars]`);
|
||||
} else {
|
||||
lines.push(` ${data.result}`);
|
||||
}
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatErrorOutput(
|
||||
error: Error & { code?: string; status?: number },
|
||||
context: ExecutionContext,
|
||||
description: string,
|
||||
duration: number,
|
||||
sourceDir: string,
|
||||
isRetryable: boolean,
|
||||
): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
if (context.isParallelExecution) {
|
||||
const prefix = getAgentPrefix(description);
|
||||
lines.push(`${prefix} Failed (${formatDuration(duration)})`);
|
||||
} else if (context.useCleanOutput) {
|
||||
lines.push(`${context.agentType} failed (${formatDuration(duration)})`);
|
||||
} else {
|
||||
lines.push(` Claude Code failed: ${description} (${formatDuration(duration)})`);
|
||||
}
|
||||
|
||||
lines.push(` Error Type: ${error.constructor.name}`);
|
||||
lines.push(` Message: ${error.message}`);
|
||||
lines.push(` Agent: ${description}`);
|
||||
lines.push(` Working Directory: ${sourceDir}`);
|
||||
lines.push(` Retryable: ${isRetryable ? 'Yes' : 'No'}`);
|
||||
|
||||
if (error.code) {
|
||||
lines.push(` Error Code: ${error.code}`);
|
||||
}
|
||||
if (error.status) {
|
||||
lines.push(` HTTP Status: ${error.status}`);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatCompletionMessage(
|
||||
context: ExecutionContext,
|
||||
description: string,
|
||||
turnCount: number,
|
||||
duration: number,
|
||||
): string {
|
||||
if (context.isParallelExecution) {
|
||||
const prefix = getAgentPrefix(description);
|
||||
return `${prefix} Complete (${turnCount} turns, ${formatDuration(duration)})`;
|
||||
}
|
||||
|
||||
if (context.useCleanOutput) {
|
||||
return `${context.agentType.charAt(0).toUpperCase() + context.agentType.slice(1)} complete! (${turnCount} turns, ${formatDuration(duration)})`;
|
||||
}
|
||||
|
||||
return ` Claude Code completed: ${description} (${turnCount} turns) in ${formatDuration(duration)}`;
|
||||
}
|
||||
|
||||
export function formatToolUseOutput(toolName: string, input: Record<string, unknown> | undefined): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(`\n Using Tool: ${toolName}`);
|
||||
if (input && Object.keys(input).length > 0) {
|
||||
lines.push(` Input: ${JSON.stringify(input, null, 2)}`);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
|
||||
export function formatToolResultOutput(displayContent: string): string[] {
|
||||
const lines: string[] = [];
|
||||
|
||||
lines.push(` Tool Result:`);
|
||||
if (displayContent) {
|
||||
lines.push(` ${displayContent}`);
|
||||
}
|
||||
|
||||
return lines;
|
||||
}
|
||||
@@ -0,0 +1,73 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Null Object pattern for progress indicator - callers never check for null
|
||||
|
||||
import { ProgressIndicator } from '../progress-indicator.js';
|
||||
import { extractAgentType } from '../utils/formatting.js';
|
||||
|
||||
export interface ProgressContext {
|
||||
description: string;
|
||||
useCleanOutput: boolean;
|
||||
}
|
||||
|
||||
export interface ProgressManager {
|
||||
start(): void;
|
||||
stop(): void;
|
||||
finish(message: string): void;
|
||||
isActive(): boolean;
|
||||
}
|
||||
|
||||
class RealProgressManager implements ProgressManager {
|
||||
private indicator: ProgressIndicator;
|
||||
private active: boolean = false;
|
||||
|
||||
constructor(message: string) {
|
||||
this.indicator = new ProgressIndicator(message);
|
||||
}
|
||||
|
||||
start(): void {
|
||||
this.indicator.start();
|
||||
this.active = true;
|
||||
}
|
||||
|
||||
stop(): void {
|
||||
this.indicator.stop();
|
||||
this.active = false;
|
||||
}
|
||||
|
||||
finish(message: string): void {
|
||||
this.indicator.finish(message);
|
||||
this.active = false;
|
||||
}
|
||||
|
||||
isActive(): boolean {
|
||||
return this.active;
|
||||
}
|
||||
}
|
||||
|
||||
/** Null Object implementation - all methods are safe no-ops */
|
||||
class NullProgressManager implements ProgressManager {
|
||||
start(): void {}
|
||||
|
||||
stop(): void {}
|
||||
|
||||
finish(_message: string): void {}
|
||||
|
||||
isActive(): boolean {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// Returns no-op when disabled
|
||||
export function createProgressManager(context: ProgressContext, disableLoader: boolean): ProgressManager {
|
||||
if (!context.useCleanOutput || disableLoader) {
|
||||
return new NullProgressManager();
|
||||
}
|
||||
|
||||
const agentType = extractAgentType(context.description);
|
||||
return new RealProgressManager(`Running ${agentType}...`);
|
||||
}
|
||||
@@ -0,0 +1,27 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Get the actual model name being used.
|
||||
* When using claude-code-router, the SDK reports its configured model (claude-sonnet)
|
||||
* but the actual model is determined by ROUTER_DEFAULT env var.
|
||||
*/
|
||||
export function getActualModelName(sdkReportedModel?: string): string | undefined {
|
||||
const routerBaseUrl = process.env.ANTHROPIC_BASE_URL;
|
||||
const routerDefault = process.env.ROUTER_DEFAULT;
|
||||
|
||||
// If router mode is active and ROUTER_DEFAULT is set, use that
|
||||
if (routerBaseUrl && routerDefault) {
|
||||
// ROUTER_DEFAULT format: "provider,model" (e.g., "gemini,gemini-2.5-pro")
|
||||
const parts = routerDefault.split(',');
|
||||
if (parts.length >= 2) {
|
||||
return parts.slice(1).join(','); // Handle model names with commas
|
||||
}
|
||||
}
|
||||
|
||||
// Fall back to SDK-reported model
|
||||
return sdkReportedModel;
|
||||
}
|
||||
@@ -0,0 +1,99 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
// Type definitions for Claude executor message processing pipeline
|
||||
|
||||
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
|
||||
|
||||
export interface ExecutionContext {
|
||||
isParallelExecution: boolean;
|
||||
useCleanOutput: boolean;
|
||||
agentType: string;
|
||||
agentKey: string;
|
||||
}
|
||||
|
||||
export interface AssistantResult {
|
||||
content: string;
|
||||
cleanedContent: string;
|
||||
apiErrorDetected: boolean;
|
||||
shouldThrow?: Error;
|
||||
logData: {
|
||||
turn: number;
|
||||
content: string;
|
||||
timestamp: string;
|
||||
};
|
||||
}
|
||||
|
||||
export interface ResultData {
|
||||
result: string | null;
|
||||
cost: number;
|
||||
duration_ms: number;
|
||||
subtype?: string;
|
||||
stop_reason?: string | null;
|
||||
permissionDenials: number;
|
||||
}
|
||||
|
||||
export interface ToolUseData {
|
||||
toolName: string;
|
||||
parameters: Record<string, unknown>;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
export interface ToolResultData {
|
||||
content: unknown;
|
||||
displayContent: string;
|
||||
timestamp: string;
|
||||
}
|
||||
|
||||
export interface ContentBlock {
|
||||
type?: string;
|
||||
text?: string;
|
||||
}
|
||||
|
||||
export interface AssistantMessage {
|
||||
type: 'assistant';
|
||||
error?: SDKAssistantMessageError;
|
||||
message: {
|
||||
content: ContentBlock[] | string;
|
||||
};
|
||||
}
|
||||
|
||||
export interface ResultMessage {
|
||||
type: 'result';
|
||||
result?: string;
|
||||
total_cost_usd?: number;
|
||||
duration_ms?: number;
|
||||
subtype?: string;
|
||||
stop_reason?: string | null;
|
||||
permission_denials?: unknown[];
|
||||
}
|
||||
|
||||
export interface ToolUseMessage {
|
||||
type: 'tool_use';
|
||||
name: string;
|
||||
input?: Record<string, unknown>;
|
||||
}
|
||||
|
||||
export interface ToolResultMessage {
|
||||
type: 'tool_result';
|
||||
content?: unknown;
|
||||
}
|
||||
|
||||
export interface ApiErrorDetection {
|
||||
detected: boolean;
|
||||
shouldThrow?: Error;
|
||||
}
|
||||
|
||||
export interface SystemInitMessage {
|
||||
type: 'system';
|
||||
subtype: 'init';
|
||||
model?: string;
|
||||
permissionMode?: string;
|
||||
}
|
||||
|
||||
export interface UserMessage {
|
||||
type: 'user';
|
||||
}
|
||||
@@ -0,0 +1,282 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Audit Session - Main Facade
|
||||
*
|
||||
* Coordinates logger, metrics tracker, and concurrency control for comprehensive
|
||||
* crash-safe audit logging.
|
||||
*/
|
||||
|
||||
import { PentestError } from '../services/error-handling.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import type { AgentEndResult } from '../types/index.js';
|
||||
import { SessionMutex } from '../utils/concurrency.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import { AgentLogger } from './logger.js';
|
||||
import { MetricsTracker } from './metrics-tracker.js';
|
||||
import { initializeAuditStructure, type SessionMetadata } from './utils.js';
|
||||
import { type AgentLogDetails, WorkflowLogger, type WorkflowSummary } from './workflow-logger.js';
|
||||
|
||||
// Global mutex instance
|
||||
const sessionMutex = new SessionMutex();
|
||||
|
||||
/**
|
||||
* AuditSession - Main audit system facade
|
||||
*/
|
||||
export class AuditSession {
|
||||
private sessionMetadata: SessionMetadata;
|
||||
private sessionId: string;
|
||||
private metricsTracker: MetricsTracker;
|
||||
private workflowLogger: WorkflowLogger;
|
||||
private currentLogger: AgentLogger | null = null;
|
||||
private currentAgentName: string | null = null;
|
||||
private initialized: boolean = false;
|
||||
|
||||
constructor(sessionMetadata: SessionMetadata) {
|
||||
this.sessionMetadata = sessionMetadata;
|
||||
this.sessionId = sessionMetadata.id;
|
||||
|
||||
// Validate required fields
|
||||
if (!this.sessionId) {
|
||||
throw new PentestError(
|
||||
'sessionMetadata.id is required',
|
||||
'config',
|
||||
false,
|
||||
{ field: 'sessionMetadata.id' },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
if (!this.sessionMetadata.webUrl) {
|
||||
throw new PentestError(
|
||||
'sessionMetadata.webUrl is required',
|
||||
'config',
|
||||
false,
|
||||
{ field: 'sessionMetadata.webUrl' },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
|
||||
// Components
|
||||
this.metricsTracker = new MetricsTracker(sessionMetadata);
|
||||
this.workflowLogger = new WorkflowLogger(sessionMetadata);
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize audit session (creates directories, session.json)
|
||||
* Idempotent and race-safe
|
||||
*
|
||||
* @param workflowId - Optional workflow ID for tracking original or resume workflows
|
||||
*/
|
||||
async initialize(workflowId?: string): Promise<void> {
|
||||
if (this.initialized) {
|
||||
return; // Already initialized
|
||||
}
|
||||
|
||||
// Create directory structure
|
||||
await initializeAuditStructure(this.sessionMetadata);
|
||||
|
||||
// Initialize metrics tracker (loads or creates session.json)
|
||||
await this.metricsTracker.initialize(workflowId);
|
||||
|
||||
// Initialize workflow logger with actual Temporal workflow ID
|
||||
await this.workflowLogger.initialize(workflowId);
|
||||
|
||||
this.initialized = true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Ensure initialized (helper for lazy initialization)
|
||||
*/
|
||||
private async ensureInitialized(): Promise<void> {
|
||||
if (!this.initialized) {
|
||||
await this.initialize();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Start agent execution
|
||||
*/
|
||||
async startAgent(agentName: string, promptContent: string, attemptNumber: number = 1): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
// 1. Save prompt snapshot (only on first attempt)
|
||||
if (attemptNumber === 1) {
|
||||
await AgentLogger.savePrompt(this.sessionMetadata, agentName, promptContent);
|
||||
}
|
||||
|
||||
// 2. Create and initialize the per-agent logger
|
||||
this.currentAgentName = agentName;
|
||||
this.currentLogger = new AgentLogger(this.sessionMetadata, agentName, attemptNumber);
|
||||
await this.currentLogger.initialize();
|
||||
|
||||
// 3. Start metrics timer
|
||||
this.metricsTracker.startAgent(agentName, attemptNumber);
|
||||
|
||||
// 4. Log start event to both agent log and workflow log
|
||||
await this.currentLogger.logEvent('agent_start', {
|
||||
agentName,
|
||||
attemptNumber,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
|
||||
await this.workflowLogger.logAgent(agentName, 'start', { attemptNumber });
|
||||
}
|
||||
|
||||
/**
|
||||
* Log event during agent execution
|
||||
*/
|
||||
async logEvent(eventType: string, eventData: unknown): Promise<void> {
|
||||
if (!this.currentLogger) {
|
||||
throw new PentestError(
|
||||
'No active logger. Call startAgent() first.',
|
||||
'validation',
|
||||
false,
|
||||
{},
|
||||
ErrorCode.AGENT_EXECUTION_FAILED,
|
||||
);
|
||||
}
|
||||
|
||||
// Log to agent-specific log file (JSON format)
|
||||
await this.currentLogger.logEvent(eventType, eventData);
|
||||
|
||||
// Also log to unified workflow log (human-readable format)
|
||||
const data = eventData as Record<string, unknown>;
|
||||
const agentName = this.currentAgentName || 'unknown';
|
||||
switch (eventType) {
|
||||
case 'tool_start':
|
||||
await this.workflowLogger.logToolStart(agentName, String(data.toolName || ''), data.parameters);
|
||||
break;
|
||||
case 'llm_response':
|
||||
await this.workflowLogger.logLlmResponse(agentName, Number(data.turn || 0), String(data.content || ''));
|
||||
break;
|
||||
// tool_end and error events are intentionally not logged to workflow log
|
||||
// to reduce noise - the agent completion message captures the outcome
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* End agent execution (mutex-protected)
|
||||
*/
|
||||
async endAgent(agentName: string, result: AgentEndResult): Promise<void> {
|
||||
// 1. Finalize agent log and close the stream
|
||||
if (this.currentLogger) {
|
||||
await this.currentLogger.logEvent('agent_end', {
|
||||
agentName,
|
||||
success: result.success,
|
||||
duration_ms: result.duration_ms,
|
||||
cost_usd: result.cost_usd,
|
||||
timestamp: formatTimestamp(),
|
||||
});
|
||||
|
||||
await this.currentLogger.close();
|
||||
this.currentLogger = null;
|
||||
}
|
||||
|
||||
// 2. Log completion to the unified workflow log
|
||||
this.currentAgentName = null;
|
||||
|
||||
const agentLogDetails: AgentLogDetails = {
|
||||
attemptNumber: result.attemptNumber,
|
||||
duration_ms: result.duration_ms,
|
||||
cost_usd: result.cost_usd,
|
||||
success: result.success,
|
||||
...(result.error !== undefined && { error: result.error }),
|
||||
};
|
||||
await this.workflowLogger.logAgent(agentName, 'end', agentLogDetails);
|
||||
|
||||
// 3. Acquire mutex before touching session.json
|
||||
const unlock = await sessionMutex.lock(this.sessionId);
|
||||
try {
|
||||
// 4. Reload-then-write inside mutex to prevent lost updates during parallel phases
|
||||
await this.metricsTracker.reload();
|
||||
await this.metricsTracker.endAgent(agentName, result);
|
||||
} finally {
|
||||
unlock();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Update session status
|
||||
*/
|
||||
async updateSessionStatus(status: 'in-progress' | 'completed' | 'failed'): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const unlock = await sessionMutex.lock(this.sessionId);
|
||||
try {
|
||||
await this.metricsTracker.reload();
|
||||
await this.metricsTracker.updateSessionStatus(status);
|
||||
} finally {
|
||||
unlock();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get current metrics (read-only)
|
||||
*/
|
||||
async getMetrics(): Promise<unknown> {
|
||||
await this.ensureInitialized();
|
||||
return this.metricsTracker.getMetrics();
|
||||
}
|
||||
|
||||
/**
|
||||
* Log phase start to unified workflow log
|
||||
*/
|
||||
async logPhaseStart(phase: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
await this.workflowLogger.logPhase(phase, 'start');
|
||||
}
|
||||
|
||||
/**
|
||||
* Log phase completion to unified workflow log
|
||||
*/
|
||||
async logPhaseComplete(phase: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
await this.workflowLogger.logPhase(phase, 'complete');
|
||||
}
|
||||
|
||||
/**
|
||||
* Log workflow completion to unified workflow log
|
||||
*/
|
||||
async logWorkflowComplete(summary: WorkflowSummary): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
await this.workflowLogger.logWorkflowComplete(summary);
|
||||
}
|
||||
|
||||
/**
|
||||
* Add a resume attempt to the session
|
||||
* Call this when a workflow is resuming from an existing workspace
|
||||
*
|
||||
* @param workflowId - The new workflow ID for this resume attempt
|
||||
* @param terminatedWorkflows - IDs of workflows that were terminated
|
||||
* @param checkpointHash - Git checkpoint hash that was restored
|
||||
*/
|
||||
async addResumeAttempt(workflowId: string, terminatedWorkflows: string[], checkpointHash?: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const unlock = await sessionMutex.lock(this.sessionId);
|
||||
try {
|
||||
await this.metricsTracker.reload();
|
||||
await this.metricsTracker.addResumeAttempt(workflowId, terminatedWorkflows, checkpointHash);
|
||||
} finally {
|
||||
unlock();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Log resume header to workflow.log
|
||||
* Call this when a workflow is resuming to add a visual separator
|
||||
*/
|
||||
async logResumeHeader(resumeInfo: {
|
||||
previousWorkflowId: string;
|
||||
newWorkflowId: string;
|
||||
checkpointHash: string;
|
||||
completedAgents: string[];
|
||||
}): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
await this.workflowLogger.logResumeHeader(resumeInfo);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,19 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Unified Audit & Metrics System
|
||||
*
|
||||
* Public API for the audit system. Provides crash-safe, append-only logging
|
||||
* and comprehensive metrics tracking for Shannon penetration testing sessions.
|
||||
*
|
||||
* IMPORTANT: Session objects must have an 'id' field (NOT 'sessionId')
|
||||
* Example: { id: "uuid", webUrl: "...", repoPath: "..." }
|
||||
*
|
||||
* @module audit
|
||||
*/
|
||||
|
||||
export { AuditSession } from './audit-session.js';
|
||||
@@ -0,0 +1,127 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* LogStream - Stream composition utility for append-only logging
|
||||
*
|
||||
* Encapsulates the common stream management pattern used by AgentLogger
|
||||
* and WorkflowLogger: opening streams in append mode, handling backpressure,
|
||||
* and proper cleanup.
|
||||
*/
|
||||
|
||||
import fs from 'node:fs';
|
||||
import path from 'node:path';
|
||||
import { ensureDirectory } from '../utils/file-io.js';
|
||||
|
||||
/**
|
||||
* LogStream - Manages a single append-only log file stream
|
||||
*/
|
||||
export class LogStream {
|
||||
private readonly filePath: string;
|
||||
private stream: fs.WriteStream | null = null;
|
||||
private _isOpen: boolean = false;
|
||||
|
||||
constructor(filePath: string) {
|
||||
this.filePath = filePath;
|
||||
}
|
||||
|
||||
/**
|
||||
* Open the stream for writing (creates parent directories, opens in append mode)
|
||||
*/
|
||||
async open(): Promise<void> {
|
||||
if (this._isOpen) {
|
||||
return;
|
||||
}
|
||||
|
||||
// Ensure parent directory exists
|
||||
await ensureDirectory(path.dirname(this.filePath));
|
||||
|
||||
// Create write stream in append mode
|
||||
this.stream = fs.createWriteStream(this.filePath, {
|
||||
flags: 'a',
|
||||
encoding: 'utf8',
|
||||
autoClose: true,
|
||||
});
|
||||
|
||||
// Handle stream errors to prevent crashes (log and mark closed)
|
||||
this.stream.on('error', (err) => {
|
||||
console.error(`LogStream error for ${this.filePath}:`, err.message);
|
||||
this._isOpen = false;
|
||||
});
|
||||
|
||||
this._isOpen = true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Write text to the stream with backpressure handling
|
||||
*/
|
||||
async write(text: string): Promise<void> {
|
||||
return new Promise((resolve, reject) => {
|
||||
if (!this._isOpen || !this.stream) {
|
||||
reject(new Error('LogStream not open'));
|
||||
return;
|
||||
}
|
||||
|
||||
const stream = this.stream;
|
||||
let drainHandler: (() => void) | null = null;
|
||||
|
||||
const cleanup = () => {
|
||||
if (drainHandler) {
|
||||
stream.removeListener('drain', drainHandler);
|
||||
drainHandler = null;
|
||||
}
|
||||
};
|
||||
|
||||
const needsDrain = !stream.write(text, 'utf8', (error) => {
|
||||
cleanup();
|
||||
if (error) {
|
||||
reject(error);
|
||||
} else if (!needsDrain) {
|
||||
resolve();
|
||||
}
|
||||
});
|
||||
|
||||
if (needsDrain) {
|
||||
drainHandler = () => {
|
||||
cleanup();
|
||||
resolve();
|
||||
};
|
||||
stream.once('drain', drainHandler);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Close the stream (flush and close)
|
||||
*/
|
||||
async close(): Promise<void> {
|
||||
if (!this._isOpen || !this.stream) {
|
||||
return;
|
||||
}
|
||||
|
||||
return new Promise((resolve) => {
|
||||
this.stream?.end(() => {
|
||||
this._isOpen = false;
|
||||
this.stream = null;
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if the stream is currently open
|
||||
*/
|
||||
get isOpen(): boolean {
|
||||
return this._isOpen;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the file path this stream writes to
|
||||
*/
|
||||
get path(): string {
|
||||
return this.filePath;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,122 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Append-Only Agent Logger
|
||||
*
|
||||
* Provides crash-safe, append-only logging for agent execution.
|
||||
* Uses LogStream for stream management with backpressure handling.
|
||||
*/
|
||||
|
||||
import { atomicWrite } from '../utils/file-io.js';
|
||||
import { formatTimestamp } from '../utils/formatting.js';
|
||||
import { LogStream } from './log-stream.js';
|
||||
import { generateLogPath, generatePromptPath, type SessionMetadata } from './utils.js';
|
||||
|
||||
interface LogEvent {
|
||||
type: string;
|
||||
timestamp: string;
|
||||
data: unknown;
|
||||
}
|
||||
|
||||
/**
|
||||
* AgentLogger - Manages append-only logging for a single agent execution
|
||||
*/
|
||||
export class AgentLogger {
|
||||
private readonly sessionMetadata: SessionMetadata;
|
||||
private readonly agentName: string;
|
||||
private readonly attemptNumber: number;
|
||||
private readonly timestamp: number;
|
||||
private readonly logStream: LogStream;
|
||||
|
||||
constructor(sessionMetadata: SessionMetadata, agentName: string, attemptNumber: number) {
|
||||
this.sessionMetadata = sessionMetadata;
|
||||
this.agentName = agentName;
|
||||
this.attemptNumber = attemptNumber;
|
||||
this.timestamp = Date.now();
|
||||
|
||||
const logPath = generateLogPath(sessionMetadata, agentName, this.timestamp, attemptNumber);
|
||||
this.logStream = new LogStream(logPath);
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize the log stream (creates file and opens stream)
|
||||
*/
|
||||
async initialize(): Promise<void> {
|
||||
if (this.logStream.isOpen) {
|
||||
return; // Already initialized
|
||||
}
|
||||
|
||||
await this.logStream.open();
|
||||
|
||||
// Write header
|
||||
await this.writeHeader();
|
||||
}
|
||||
|
||||
/**
|
||||
* Write header to log file
|
||||
*/
|
||||
private async writeHeader(): Promise<void> {
|
||||
const header = [
|
||||
`========================================`,
|
||||
`Agent: ${this.agentName}`,
|
||||
`Attempt: ${this.attemptNumber}`,
|
||||
`Started: ${formatTimestamp(this.timestamp)}`,
|
||||
`Session: ${this.sessionMetadata.id}`,
|
||||
`Web URL: ${this.sessionMetadata.webUrl}`,
|
||||
`========================================\n`,
|
||||
].join('\n');
|
||||
|
||||
return this.logStream.write(header);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log an event (tool_start, tool_end, llm_response, etc.)
|
||||
* Events are logged as JSON for parseability
|
||||
*/
|
||||
async logEvent(eventType: string, eventData: unknown): Promise<void> {
|
||||
const event: LogEvent = {
|
||||
type: eventType,
|
||||
timestamp: formatTimestamp(),
|
||||
data: eventData,
|
||||
};
|
||||
|
||||
const eventLine = `${JSON.stringify(event)}\n`;
|
||||
return this.logStream.write(eventLine);
|
||||
}
|
||||
|
||||
/**
|
||||
* Close the log stream
|
||||
*/
|
||||
async close(): Promise<void> {
|
||||
return this.logStream.close();
|
||||
}
|
||||
|
||||
/**
|
||||
* Save prompt snapshot to prompts directory
|
||||
* Static method - doesn't require logger instance
|
||||
*/
|
||||
static async savePrompt(sessionMetadata: SessionMetadata, agentName: string, promptContent: string): Promise<void> {
|
||||
const promptPath = generatePromptPath(sessionMetadata, agentName);
|
||||
|
||||
// Create header with metadata
|
||||
const header = [
|
||||
`# Prompt Snapshot: ${agentName}`,
|
||||
``,
|
||||
`**Session:** ${sessionMetadata.id}`,
|
||||
`**Web URL:** ${sessionMetadata.webUrl}`,
|
||||
`**Saved:** ${formatTimestamp()}`,
|
||||
``,
|
||||
`---`,
|
||||
``,
|
||||
].join('\n');
|
||||
|
||||
const fullContent = header + promptContent;
|
||||
|
||||
// Use atomic write for safety
|
||||
await atomicWrite(promptPath, fullContent);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,380 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Metrics Tracker
|
||||
*
|
||||
* Manages session.json with comprehensive timing, cost, and validation metrics.
|
||||
* Tracks attempt-level data for complete forensic trail.
|
||||
*/
|
||||
|
||||
import { PentestError } from '../services/error-handling.js';
|
||||
import { AGENT_PHASE_MAP, type PhaseName } from '../session-manager.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import type { AgentEndResult, AgentName } from '../types/index.js';
|
||||
import { atomicWrite, fileExists, readJson } from '../utils/file-io.js';
|
||||
import { calculatePercentage, formatTimestamp } from '../utils/formatting.js';
|
||||
import { generateSessionJsonPath, type SessionMetadata } from './utils.js';
|
||||
|
||||
interface AttemptData {
|
||||
attempt_number: number;
|
||||
duration_ms: number;
|
||||
cost_usd: number;
|
||||
success: boolean;
|
||||
timestamp: string;
|
||||
model?: string | undefined;
|
||||
error?: string | undefined;
|
||||
}
|
||||
|
||||
interface AgentAuditMetrics {
|
||||
status: 'in-progress' | 'success' | 'failed';
|
||||
attempts: AttemptData[];
|
||||
final_duration_ms: number;
|
||||
total_cost_usd: number;
|
||||
model?: string | undefined;
|
||||
checkpoint?: string | undefined;
|
||||
}
|
||||
|
||||
interface PhaseMetrics {
|
||||
duration_ms: number;
|
||||
duration_percentage: number;
|
||||
cost_usd: number;
|
||||
agent_count: number;
|
||||
}
|
||||
|
||||
export interface ResumeAttempt {
|
||||
workflowId: string;
|
||||
timestamp: string;
|
||||
terminatedPrevious?: string;
|
||||
resumedFromCheckpoint?: string;
|
||||
}
|
||||
|
||||
interface SessionData {
|
||||
session: {
|
||||
id: string;
|
||||
webUrl: string;
|
||||
repoPath?: string;
|
||||
status: 'in-progress' | 'completed' | 'failed';
|
||||
createdAt: string;
|
||||
completedAt?: string;
|
||||
originalWorkflowId?: string; // First workflow that created this workspace
|
||||
resumeAttempts?: ResumeAttempt[]; // Track all resume attempts
|
||||
};
|
||||
metrics: {
|
||||
total_duration_ms: number;
|
||||
total_cost_usd: number;
|
||||
phases: Record<string, PhaseMetrics>;
|
||||
agents: Record<string, AgentAuditMetrics>;
|
||||
};
|
||||
}
|
||||
|
||||
interface ActiveTimer {
|
||||
startTime: number;
|
||||
attemptNumber: number;
|
||||
}
|
||||
|
||||
/**
|
||||
* MetricsTracker - Manages metrics for a session
|
||||
*/
|
||||
export class MetricsTracker {
|
||||
private sessionMetadata: SessionMetadata;
|
||||
private sessionJsonPath: string;
|
||||
private data: SessionData | null = null;
|
||||
private activeTimers: Map<string, ActiveTimer> = new Map();
|
||||
|
||||
constructor(sessionMetadata: SessionMetadata) {
|
||||
this.sessionMetadata = sessionMetadata;
|
||||
this.sessionJsonPath = generateSessionJsonPath(sessionMetadata);
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize session.json (idempotent)
|
||||
*
|
||||
* @param workflowId - Optional workflow ID to set as originalWorkflowId for new sessions
|
||||
*/
|
||||
async initialize(workflowId?: string): Promise<void> {
|
||||
// Check if session.json already exists
|
||||
const exists = await fileExists(this.sessionJsonPath);
|
||||
|
||||
if (exists) {
|
||||
// Load existing data
|
||||
this.data = await readJson<SessionData>(this.sessionJsonPath);
|
||||
} else {
|
||||
// Create new session.json
|
||||
this.data = this.createInitialData(workflowId);
|
||||
await this.save();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Create initial session.json structure
|
||||
*
|
||||
* @param workflowId - Optional workflow ID to set as originalWorkflowId
|
||||
*/
|
||||
private createInitialData(workflowId?: string): SessionData {
|
||||
const sessionData: SessionData = {
|
||||
session: {
|
||||
id: this.sessionMetadata.id,
|
||||
webUrl: this.sessionMetadata.webUrl,
|
||||
status: 'in-progress',
|
||||
createdAt: (this.sessionMetadata as { createdAt?: string }).createdAt || formatTimestamp(),
|
||||
resumeAttempts: [],
|
||||
},
|
||||
metrics: {
|
||||
total_duration_ms: 0,
|
||||
total_cost_usd: 0,
|
||||
phases: {}, // Phase-level aggregations
|
||||
agents: {}, // Agent-level metrics
|
||||
},
|
||||
};
|
||||
|
||||
// Set originalWorkflowId if provided (for new workspaces)
|
||||
if (workflowId) {
|
||||
sessionData.session.originalWorkflowId = workflowId;
|
||||
}
|
||||
|
||||
// Only add repoPath if it exists
|
||||
if (this.sessionMetadata.repoPath) {
|
||||
sessionData.session.repoPath = this.sessionMetadata.repoPath;
|
||||
}
|
||||
return sessionData;
|
||||
}
|
||||
|
||||
/**
|
||||
* Start tracking an agent execution
|
||||
*/
|
||||
startAgent(agentName: string, attemptNumber: number): void {
|
||||
this.activeTimers.set(agentName, {
|
||||
startTime: Date.now(),
|
||||
attemptNumber,
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* End agent execution and update metrics
|
||||
*/
|
||||
async endAgent(agentName: string, result: AgentEndResult): Promise<void> {
|
||||
if (!this.data) {
|
||||
throw new PentestError(
|
||||
'MetricsTracker not initialized',
|
||||
'validation',
|
||||
false,
|
||||
{},
|
||||
ErrorCode.AGENT_EXECUTION_FAILED,
|
||||
);
|
||||
}
|
||||
|
||||
// 1. Initialize agent metrics if first time seeing this agent
|
||||
const existingAgent = this.data.metrics.agents[agentName];
|
||||
const agent = existingAgent ?? {
|
||||
status: 'in-progress' as const,
|
||||
attempts: [],
|
||||
final_duration_ms: 0,
|
||||
total_cost_usd: 0,
|
||||
};
|
||||
this.data.metrics.agents[agentName] = agent;
|
||||
|
||||
// 2. Build attempt record with optional model/error fields
|
||||
const attempt: AttemptData = {
|
||||
attempt_number: result.attemptNumber,
|
||||
duration_ms: result.duration_ms,
|
||||
cost_usd: result.cost_usd,
|
||||
success: result.success,
|
||||
timestamp: formatTimestamp(),
|
||||
};
|
||||
|
||||
if (result.model) {
|
||||
attempt.model = result.model;
|
||||
}
|
||||
|
||||
if (result.error) {
|
||||
attempt.error = result.error;
|
||||
}
|
||||
|
||||
// 3. Append attempt to history
|
||||
agent.attempts.push(attempt);
|
||||
|
||||
// 4. Recalculate total cost across all attempts (includes failures)
|
||||
agent.total_cost_usd = agent.attempts.reduce((sum, a) => sum + a.cost_usd, 0);
|
||||
|
||||
// 5. Update agent status based on outcome
|
||||
if (result.success) {
|
||||
agent.status = 'success';
|
||||
agent.final_duration_ms = result.duration_ms;
|
||||
|
||||
// 6. Attach model and checkpoint metadata on success
|
||||
if (result.model) {
|
||||
agent.model = result.model;
|
||||
}
|
||||
|
||||
if (result.checkpoint) {
|
||||
agent.checkpoint = result.checkpoint;
|
||||
}
|
||||
} else {
|
||||
if (result.isFinalAttempt) {
|
||||
agent.status = 'failed';
|
||||
}
|
||||
}
|
||||
|
||||
// 7. Clear active timer
|
||||
this.activeTimers.delete(agentName);
|
||||
|
||||
// 8. Recalculate phase and session-level aggregations
|
||||
this.recalculateAggregations();
|
||||
|
||||
// 9. Persist to session.json
|
||||
await this.save();
|
||||
}
|
||||
|
||||
/**
|
||||
* Update session status
|
||||
*/
|
||||
async updateSessionStatus(status: 'in-progress' | 'completed' | 'failed'): Promise<void> {
|
||||
if (!this.data) return;
|
||||
|
||||
this.data.session.status = status;
|
||||
|
||||
if (status === 'completed' || status === 'failed') {
|
||||
this.data.session.completedAt = formatTimestamp();
|
||||
}
|
||||
|
||||
await this.save();
|
||||
}
|
||||
|
||||
/**
|
||||
* Add a resume attempt to the session
|
||||
*
|
||||
* @param workflowId - The new workflow ID for this resume attempt
|
||||
* @param terminatedWorkflows - IDs of workflows that were terminated
|
||||
* @param checkpointHash - Git checkpoint hash that was restored
|
||||
*/
|
||||
async addResumeAttempt(workflowId: string, terminatedWorkflows: string[], checkpointHash?: string): Promise<void> {
|
||||
if (!this.data) {
|
||||
throw new PentestError(
|
||||
'MetricsTracker not initialized',
|
||||
'validation',
|
||||
false,
|
||||
{},
|
||||
ErrorCode.AGENT_EXECUTION_FAILED,
|
||||
);
|
||||
}
|
||||
|
||||
// Ensure originalWorkflowId is set (backfill if missing from old sessions)
|
||||
if (!this.data.session.originalWorkflowId) {
|
||||
this.data.session.originalWorkflowId = this.data.session.id;
|
||||
}
|
||||
|
||||
// Ensure resumeAttempts array exists
|
||||
if (!this.data.session.resumeAttempts) {
|
||||
this.data.session.resumeAttempts = [];
|
||||
}
|
||||
|
||||
// Add new resume attempt
|
||||
const resumeAttempt: ResumeAttempt = {
|
||||
workflowId,
|
||||
timestamp: formatTimestamp(),
|
||||
};
|
||||
|
||||
if (terminatedWorkflows.length > 0) {
|
||||
resumeAttempt.terminatedPrevious = terminatedWorkflows.join(',');
|
||||
}
|
||||
|
||||
if (checkpointHash) {
|
||||
resumeAttempt.resumedFromCheckpoint = checkpointHash;
|
||||
}
|
||||
|
||||
this.data.session.resumeAttempts.push(resumeAttempt);
|
||||
|
||||
await this.save();
|
||||
}
|
||||
|
||||
/**
|
||||
* Recalculate aggregations (total duration, total cost, phases)
|
||||
*/
|
||||
private recalculateAggregations(): void {
|
||||
if (!this.data) return;
|
||||
|
||||
const agents = this.data.metrics.agents;
|
||||
|
||||
// Only count successful agents
|
||||
const successfulAgents = Object.entries(agents).filter(([, data]) => data.status === 'success');
|
||||
|
||||
// Calculate total duration and cost
|
||||
const totalDuration = successfulAgents.reduce((sum, [, data]) => sum + data.final_duration_ms, 0);
|
||||
|
||||
const totalCost = successfulAgents.reduce((sum, [, data]) => sum + data.total_cost_usd, 0);
|
||||
|
||||
this.data.metrics.total_duration_ms = totalDuration;
|
||||
this.data.metrics.total_cost_usd = totalCost;
|
||||
|
||||
// Calculate phase-level metrics
|
||||
this.data.metrics.phases = this.calculatePhaseMetrics(successfulAgents);
|
||||
}
|
||||
|
||||
/**
|
||||
* Calculate phase-level metrics
|
||||
*/
|
||||
private calculatePhaseMetrics(successfulAgents: Array<[string, AgentAuditMetrics]>): Record<string, PhaseMetrics> {
|
||||
const phases: Record<PhaseName, AgentAuditMetrics[]> = {
|
||||
'pre-recon': [],
|
||||
recon: [],
|
||||
'vulnerability-analysis': [],
|
||||
exploitation: [],
|
||||
reporting: [],
|
||||
};
|
||||
|
||||
// Group agents by phase using imported AGENT_PHASE_MAP
|
||||
for (const [agentName, agentData] of successfulAgents) {
|
||||
const phase = AGENT_PHASE_MAP[agentName as AgentName];
|
||||
if (phase) {
|
||||
phases[phase].push(agentData);
|
||||
}
|
||||
}
|
||||
|
||||
// Calculate metrics per phase
|
||||
const phaseMetrics: Record<string, PhaseMetrics> = {};
|
||||
// biome-ignore lint/style/noNonNullAssertion: called from recalculateAggregations which guards this.data
|
||||
const totalDuration = this.data!.metrics.total_duration_ms;
|
||||
|
||||
for (const [phaseName, agentList] of Object.entries(phases)) {
|
||||
if (agentList.length === 0) continue;
|
||||
|
||||
const phaseDuration = agentList.reduce((sum, agent) => sum + agent.final_duration_ms, 0);
|
||||
const phaseCost = agentList.reduce((sum, agent) => sum + agent.total_cost_usd, 0);
|
||||
|
||||
phaseMetrics[phaseName] = {
|
||||
duration_ms: phaseDuration,
|
||||
duration_percentage: calculatePercentage(phaseDuration, totalDuration),
|
||||
cost_usd: phaseCost,
|
||||
agent_count: agentList.length,
|
||||
};
|
||||
}
|
||||
|
||||
return phaseMetrics;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get current metrics
|
||||
*/
|
||||
getMetrics(): SessionData {
|
||||
return JSON.parse(JSON.stringify(this.data)) as SessionData;
|
||||
}
|
||||
|
||||
/**
|
||||
* Save metrics to session.json (atomic write)
|
||||
*/
|
||||
private async save(): Promise<void> {
|
||||
if (!this.data) return;
|
||||
await atomicWrite(this.sessionJsonPath, this.data);
|
||||
}
|
||||
|
||||
/**
|
||||
* Reload metrics from disk
|
||||
*/
|
||||
async reload(): Promise<void> {
|
||||
this.data = await readJson<SessionData>(this.sessionJsonPath);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,130 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Audit System Utilities
|
||||
*
|
||||
* Core utility functions for path generation, atomic writes, and formatting.
|
||||
* All functions are pure and crash-safe.
|
||||
*/
|
||||
|
||||
import fs from 'node:fs/promises';
|
||||
import path from 'node:path';
|
||||
import { WORKSPACES_DIR } from '../paths.js';
|
||||
import { ensureDirectory } from '../utils/file-io.js';
|
||||
|
||||
export type { SessionMetadata } from '../types/audit.js';
|
||||
|
||||
import type { SessionMetadata } from '../types/audit.js';
|
||||
|
||||
/**
|
||||
* Extract and sanitize hostname from URL for use in identifiers
|
||||
*/
|
||||
export function sanitizeHostname(url: string): string {
|
||||
return new URL(url).hostname.replace(/[^a-zA-Z0-9-]/g, '-');
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate standardized session identifier from workflow ID
|
||||
* Workflow IDs already contain hostname, so we use them directly
|
||||
*/
|
||||
export function generateSessionIdentifier(sessionMetadata: SessionMetadata): string {
|
||||
return sessionMetadata.id;
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate path to audit log directory for a session
|
||||
* Uses custom outputPath if provided, otherwise defaults to WORKSPACES_DIR
|
||||
*/
|
||||
export function generateAuditPath(sessionMetadata: SessionMetadata): string {
|
||||
const sessionIdentifier = generateSessionIdentifier(sessionMetadata);
|
||||
const baseDir = sessionMetadata.outputPath || WORKSPACES_DIR;
|
||||
return path.join(baseDir, sessionIdentifier);
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate path to agent log file
|
||||
*/
|
||||
export function generateLogPath(
|
||||
sessionMetadata: SessionMetadata,
|
||||
agentName: string,
|
||||
timestamp: number,
|
||||
attemptNumber: number,
|
||||
): string {
|
||||
const auditPath = generateAuditPath(sessionMetadata);
|
||||
const filename = `${timestamp}_${agentName}_attempt-${attemptNumber}.log`;
|
||||
return path.join(auditPath, 'agents', filename);
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate path to prompt snapshot file
|
||||
*/
|
||||
export function generatePromptPath(sessionMetadata: SessionMetadata, agentName: string): string {
|
||||
const auditPath = generateAuditPath(sessionMetadata);
|
||||
return path.join(auditPath, 'prompts', `${agentName}.md`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate path to session.json file
|
||||
*/
|
||||
export function generateSessionJsonPath(sessionMetadata: SessionMetadata): string {
|
||||
const auditPath = generateAuditPath(sessionMetadata);
|
||||
return path.join(auditPath, 'session.json');
|
||||
}
|
||||
|
||||
/**
|
||||
* Generate path to workflow.log file
|
||||
*/
|
||||
export function generateWorkflowLogPath(sessionMetadata: SessionMetadata): string {
|
||||
const auditPath = generateAuditPath(sessionMetadata);
|
||||
return path.join(auditPath, 'workflow.log');
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize audit directory structure for a session
|
||||
* Creates: workspaces/{sessionId}/, agents/, prompts/, deliverables/
|
||||
*/
|
||||
export async function initializeAuditStructure(sessionMetadata: SessionMetadata): Promise<void> {
|
||||
const auditPath = generateAuditPath(sessionMetadata);
|
||||
const agentsPath = path.join(auditPath, 'agents');
|
||||
const promptsPath = path.join(auditPath, 'prompts');
|
||||
const deliverablesPath = path.join(auditPath, 'deliverables');
|
||||
|
||||
await ensureDirectory(auditPath);
|
||||
await ensureDirectory(agentsPath);
|
||||
await ensureDirectory(promptsPath);
|
||||
await ensureDirectory(deliverablesPath);
|
||||
}
|
||||
|
||||
/**
|
||||
* Copy deliverable files from repo to workspaces for self-contained audit trail.
|
||||
* No-ops if source directory doesn't exist. Idempotent and parallel-safe.
|
||||
*/
|
||||
export async function copyDeliverablesToAudit(sessionMetadata: SessionMetadata, repoPath: string): Promise<void> {
|
||||
const sourceDir = path.join(repoPath, 'deliverables');
|
||||
const destDir = path.join(generateAuditPath(sessionMetadata), 'deliverables');
|
||||
|
||||
let entries: string[];
|
||||
try {
|
||||
entries = await fs.readdir(sourceDir);
|
||||
} catch {
|
||||
// Source directory doesn't exist yet — nothing to copy
|
||||
return;
|
||||
}
|
||||
|
||||
await ensureDirectory(destDir);
|
||||
|
||||
for (const entry of entries) {
|
||||
const sourcePath = path.join(sourceDir, entry);
|
||||
const destPath = path.join(destDir, entry);
|
||||
|
||||
// Only copy files, skip subdirectories
|
||||
const stat = await fs.stat(sourcePath);
|
||||
if (stat.isFile()) {
|
||||
await fs.copyFile(sourcePath, destPath);
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,374 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Workflow Logger
|
||||
*
|
||||
* Provides a unified, human-readable log file per workflow.
|
||||
* Optimized for `tail -f` viewing during concurrent workflow execution.
|
||||
*/
|
||||
|
||||
import fs from 'node:fs/promises';
|
||||
import { formatDuration, formatTimestamp } from '../utils/formatting.js';
|
||||
import { LogStream } from './log-stream.js';
|
||||
import { generateWorkflowLogPath, type SessionMetadata } from './utils.js';
|
||||
|
||||
export interface AgentLogDetails {
|
||||
attemptNumber?: number;
|
||||
duration_ms?: number;
|
||||
cost_usd?: number;
|
||||
success?: boolean;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
export interface AgentMetricsSummary {
|
||||
durationMs: number;
|
||||
costUsd: number | null;
|
||||
}
|
||||
|
||||
export interface WorkflowSummary {
|
||||
status: 'completed' | 'failed';
|
||||
totalDurationMs: number;
|
||||
totalCostUsd: number;
|
||||
completedAgents: string[];
|
||||
agentMetrics: Record<string, AgentMetricsSummary>;
|
||||
error?: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* WorkflowLogger - Manages the unified workflow log file
|
||||
*/
|
||||
export class WorkflowLogger {
|
||||
private readonly sessionMetadata: SessionMetadata;
|
||||
private readonly logStream: LogStream;
|
||||
private workflowId: string | undefined;
|
||||
|
||||
constructor(sessionMetadata: SessionMetadata) {
|
||||
this.sessionMetadata = sessionMetadata;
|
||||
const logPath = generateWorkflowLogPath(sessionMetadata);
|
||||
this.logStream = new LogStream(logPath);
|
||||
}
|
||||
|
||||
/**
|
||||
* Initialize the log stream (creates file and writes header)
|
||||
*/
|
||||
async initialize(workflowId?: string): Promise<void> {
|
||||
if (workflowId) {
|
||||
this.workflowId = workflowId;
|
||||
}
|
||||
|
||||
if (this.logStream.isOpen) {
|
||||
return;
|
||||
}
|
||||
|
||||
await this.logStream.open();
|
||||
|
||||
// Write header only if file is new (empty)
|
||||
const stats = await fs.stat(this.logStream.path).catch(() => null);
|
||||
if (!stats || stats.size === 0) {
|
||||
await this.writeHeader();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Write header to log file
|
||||
*/
|
||||
private async writeHeader(): Promise<void> {
|
||||
const header = [
|
||||
`================================================================================`,
|
||||
`Shannon Pentest - Workflow Log`,
|
||||
`================================================================================`,
|
||||
`Workflow ID: ${this.workflowId ?? this.sessionMetadata.id}`,
|
||||
`Target URL: ${this.sessionMetadata.webUrl}`,
|
||||
`Started: ${formatTimestamp()}`,
|
||||
`================================================================================`,
|
||||
``,
|
||||
].join('\n');
|
||||
|
||||
return this.logStream.write(header);
|
||||
}
|
||||
|
||||
/**
|
||||
* Write resume header to log file when workflow is resumed
|
||||
*/
|
||||
async logResumeHeader(resumeInfo: {
|
||||
previousWorkflowId: string;
|
||||
newWorkflowId: string;
|
||||
checkpointHash: string;
|
||||
completedAgents: string[];
|
||||
}): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const header = [
|
||||
``,
|
||||
`================================================================================`,
|
||||
`RESUMED`,
|
||||
`================================================================================`,
|
||||
`Previous Workflow ID: ${resumeInfo.previousWorkflowId}`,
|
||||
`New Workflow ID: ${resumeInfo.newWorkflowId}`,
|
||||
`Resumed At: ${formatTimestamp()}`,
|
||||
`Checkpoint: ${resumeInfo.checkpointHash}`,
|
||||
`Completed: ${resumeInfo.completedAgents.length} agents (${resumeInfo.completedAgents.join(', ')})`,
|
||||
`================================================================================`,
|
||||
``,
|
||||
].join('\n');
|
||||
|
||||
return this.logStream.write(header);
|
||||
}
|
||||
|
||||
/**
|
||||
* Format timestamp for log line (local time, human readable)
|
||||
*/
|
||||
private formatLogTime(): string {
|
||||
const now = new Date();
|
||||
return now.toISOString().replace('T', ' ').slice(0, 19);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log a phase transition event
|
||||
*/
|
||||
async logPhase(phase: string, event: 'start' | 'complete'): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const action = event === 'start' ? 'Starting' : 'Completed';
|
||||
const line = `[${this.formatLogTime()}] [PHASE] ${action}: ${phase}\n`;
|
||||
|
||||
// Add blank line before phase start for readability
|
||||
if (event === 'start') {
|
||||
await this.logStream.write('\n');
|
||||
}
|
||||
|
||||
await this.logStream.write(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log an agent event
|
||||
*/
|
||||
async logAgent(agentName: string, event: 'start' | 'end', details?: AgentLogDetails): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
let message: string;
|
||||
|
||||
if (event === 'start') {
|
||||
const attempt = details?.attemptNumber ?? 1;
|
||||
message = `${agentName}: Starting (attempt ${attempt})`;
|
||||
} else {
|
||||
const parts: string[] = [`${agentName}:`];
|
||||
|
||||
if (details?.success === false) {
|
||||
parts.push('Failed');
|
||||
if (details?.error) {
|
||||
parts.push(`- ${details.error}`);
|
||||
}
|
||||
} else {
|
||||
parts.push('Completed');
|
||||
}
|
||||
|
||||
if (details?.duration_ms !== undefined) {
|
||||
parts.push(`(${formatDuration(details.duration_ms)}`);
|
||||
if (details?.cost_usd !== undefined) {
|
||||
parts.push(`$${details.cost_usd.toFixed(2)})`);
|
||||
} else {
|
||||
parts.push(')');
|
||||
}
|
||||
}
|
||||
|
||||
message = parts.join(' ');
|
||||
}
|
||||
|
||||
const line = `[${this.formatLogTime()}] [AGENT] ${message}\n`;
|
||||
await this.logStream.write(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log a general event
|
||||
*/
|
||||
async logEvent(eventType: string, message: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const line = `[${this.formatLogTime()}] [${eventType.toUpperCase()}] ${message}\n`;
|
||||
await this.logStream.write(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log an error
|
||||
*/
|
||||
async logError(error: Error, context?: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const contextStr = context ? ` (${context})` : '';
|
||||
const line = `[${this.formatLogTime()}] [ERROR] ${error.message}${contextStr}\n`;
|
||||
await this.logStream.write(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Truncate string to max length with ellipsis
|
||||
*/
|
||||
private truncate(str: string, maxLen: number): string {
|
||||
if (str.length <= maxLen) return str;
|
||||
return `${str.slice(0, maxLen - 3)}...`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Format tool parameters for human-readable display
|
||||
*/
|
||||
private formatToolParams(toolName: string, params: unknown): string {
|
||||
if (!params || typeof params !== 'object') {
|
||||
return '';
|
||||
}
|
||||
|
||||
const p = params as Record<string, unknown>;
|
||||
|
||||
// Tool-specific formatting for common tools
|
||||
switch (toolName) {
|
||||
case 'Bash':
|
||||
if (p.command) {
|
||||
return this.truncate(String(p.command).replace(/\n/g, ' '), 100);
|
||||
}
|
||||
break;
|
||||
case 'Read':
|
||||
if (p.file_path) {
|
||||
return String(p.file_path);
|
||||
}
|
||||
break;
|
||||
case 'Write':
|
||||
if (p.file_path) {
|
||||
return String(p.file_path);
|
||||
}
|
||||
break;
|
||||
case 'Edit':
|
||||
if (p.file_path) {
|
||||
return String(p.file_path);
|
||||
}
|
||||
break;
|
||||
case 'Glob':
|
||||
if (p.pattern) {
|
||||
return String(p.pattern);
|
||||
}
|
||||
break;
|
||||
case 'Grep':
|
||||
if (p.pattern) {
|
||||
const path = p.path ? ` in ${p.path}` : '';
|
||||
return `"${this.truncate(String(p.pattern), 50)}"${path}`;
|
||||
}
|
||||
break;
|
||||
case 'WebFetch':
|
||||
if (p.url) {
|
||||
return String(p.url);
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
// Default: show first string-valued param truncated
|
||||
for (const [key, val] of Object.entries(p)) {
|
||||
if (typeof val === 'string' && val.length > 0) {
|
||||
return `${key}=${this.truncate(val, 60)}`;
|
||||
}
|
||||
}
|
||||
|
||||
return '';
|
||||
}
|
||||
|
||||
/**
|
||||
* Log tool start event
|
||||
*/
|
||||
async logToolStart(agentName: string, toolName: string, parameters: unknown): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const params = this.formatToolParams(toolName, parameters);
|
||||
const paramStr = params ? `: ${params}` : '';
|
||||
const line = `[${this.formatLogTime()}] [${agentName}] [TOOL] ${toolName}${paramStr}\n`;
|
||||
await this.logStream.write(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log LLM response
|
||||
*/
|
||||
async logLlmResponse(agentName: string, turn: number, content: string): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
// Show full content, replacing newlines with escaped version for single-line output
|
||||
const escaped = content.replace(/\n/g, '\\n');
|
||||
const line = `[${this.formatLogTime()}] [${agentName}] [LLM] Turn ${turn}: ${escaped}\n`;
|
||||
await this.logStream.write(line);
|
||||
}
|
||||
|
||||
/**
|
||||
* Format a pipe-delimited error string into indented multi-line display.
|
||||
*
|
||||
* Input: "phase context|ErrorType|message|Hint: ..."
|
||||
* Output: "Error: phase context\n ErrorType\n ..."
|
||||
*/
|
||||
private formatErrorBlock(errorString: string): string {
|
||||
const segments = errorString.split('|');
|
||||
const label = 'Error: ';
|
||||
const indent = ' '.repeat(label.length);
|
||||
|
||||
const lines = segments.map((segment, i) => (i === 0 ? `${label}${segment.trim()}` : `${indent}${segment.trim()}`));
|
||||
|
||||
return `${lines.join('\n')}\n`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Log workflow completion with full summary
|
||||
*/
|
||||
async logWorkflowComplete(summary: WorkflowSummary): Promise<void> {
|
||||
await this.ensureInitialized();
|
||||
|
||||
const status = summary.status === 'completed' ? 'COMPLETED' : 'FAILED';
|
||||
|
||||
const lines: string[] = [
|
||||
'',
|
||||
'================================================================================',
|
||||
`Workflow ${status}`,
|
||||
'────────────────────────────────────────',
|
||||
`Workflow ID: ${this.workflowId ?? this.sessionMetadata.id}`,
|
||||
`Status: ${summary.status}`,
|
||||
`Duration: ${formatDuration(summary.totalDurationMs)}`,
|
||||
`Total Cost: $${summary.totalCostUsd.toFixed(4)}`,
|
||||
`Agents: ${summary.completedAgents.length} completed`,
|
||||
];
|
||||
|
||||
if (summary.error) {
|
||||
lines.push(this.formatErrorBlock(summary.error).trimEnd());
|
||||
}
|
||||
|
||||
lines.push('');
|
||||
lines.push('Agent Breakdown:');
|
||||
|
||||
for (const agentName of summary.completedAgents) {
|
||||
const metrics = summary.agentMetrics[agentName];
|
||||
if (metrics) {
|
||||
const duration = formatDuration(metrics.durationMs);
|
||||
const cost = metrics.costUsd !== null ? `$${metrics.costUsd.toFixed(4)}` : 'N/A';
|
||||
lines.push(` - ${agentName} (${duration}, ${cost})`);
|
||||
} else {
|
||||
lines.push(` - ${agentName}`);
|
||||
}
|
||||
}
|
||||
|
||||
lines.push('================================================================================');
|
||||
|
||||
// Single atomic write to prevent interleaved/duplicate output in log tailers
|
||||
await this.logStream.write(`${lines.join('\n')}\n`);
|
||||
}
|
||||
|
||||
/**
|
||||
* Ensure initialized (helper for lazy initialization)
|
||||
*/
|
||||
private async ensureInitialized(): Promise<void> {
|
||||
if (!this.logStream.isOpen) {
|
||||
await this.initialize();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Close the log stream
|
||||
*/
|
||||
async close(): Promise<void> {
|
||||
return this.logStream.close();
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,569 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { createRequire } from 'node:module';
|
||||
import { Ajv, type ErrorObject, type ValidateFunction } from 'ajv';
|
||||
import type { FormatsPlugin } from 'ajv-formats';
|
||||
import yaml from 'js-yaml';
|
||||
import { fs } from 'zx';
|
||||
import { PentestError } from './services/error-handling.js';
|
||||
import type { Authentication, Config, DistributedConfig, Rule } from './types/config.js';
|
||||
import { ErrorCode } from './types/errors.js';
|
||||
|
||||
// Handle ESM/CJS interop for ajv-formats using require
|
||||
const require = createRequire(import.meta.url);
|
||||
const addFormats: FormatsPlugin = require('ajv-formats');
|
||||
|
||||
const ajv = new Ajv({ allErrors: true, verbose: true });
|
||||
addFormats(ajv);
|
||||
|
||||
let configSchema: object;
|
||||
let validateSchema: ValidateFunction;
|
||||
|
||||
try {
|
||||
const schemaPath = new URL('../configs/config-schema.json', import.meta.url);
|
||||
const schemaContent = await fs.readFile(schemaPath, 'utf8');
|
||||
configSchema = JSON.parse(schemaContent) as object;
|
||||
validateSchema = ajv.compile(configSchema);
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
throw new PentestError(`Failed to load configuration schema: ${errMsg}`, 'config', false, {
|
||||
schemaPath: '../configs/config-schema.json',
|
||||
originalError: errMsg,
|
||||
});
|
||||
}
|
||||
|
||||
const DANGEROUS_PATTERNS: RegExp[] = [
|
||||
/\.\.\//, // Path traversal
|
||||
/[<>]/, // HTML/XML injection
|
||||
/javascript:/i, // JavaScript URLs
|
||||
/data:/i, // Data URLs
|
||||
/file:/i, // File URLs
|
||||
];
|
||||
|
||||
/**
|
||||
* Format a single AJV error into a human-readable message.
|
||||
* Translates AJV error keywords into plain English descriptions.
|
||||
*/
|
||||
function formatAjvError(error: ErrorObject): string {
|
||||
const path = error.instancePath || 'root';
|
||||
const params = error.params as Record<string, unknown>;
|
||||
|
||||
switch (error.keyword) {
|
||||
case 'required': {
|
||||
const missingProperty = params.missingProperty as string;
|
||||
return `Missing required field: "${missingProperty}" at ${path || 'root'}`;
|
||||
}
|
||||
|
||||
case 'type': {
|
||||
const expectedType = params.type as string;
|
||||
return `Invalid type at ${path}: expected ${expectedType}`;
|
||||
}
|
||||
|
||||
case 'enum': {
|
||||
const allowedValues = params.allowedValues as unknown[];
|
||||
const formattedValues = allowedValues.map((v) => `"${v}"`).join(', ');
|
||||
return `Invalid value at ${path}: must be one of [${formattedValues}]`;
|
||||
}
|
||||
|
||||
case 'additionalProperties': {
|
||||
const additionalProperty = params.additionalProperty as string;
|
||||
return `Unknown field at ${path}: "${additionalProperty}" is not allowed`;
|
||||
}
|
||||
|
||||
case 'minLength': {
|
||||
const limit = params.limit as number;
|
||||
return `Value at ${path} is too short: must have at least ${limit} character(s)`;
|
||||
}
|
||||
|
||||
case 'maxLength': {
|
||||
const limit = params.limit as number;
|
||||
return `Value at ${path} is too long: must have at most ${limit} character(s)`;
|
||||
}
|
||||
|
||||
case 'minimum': {
|
||||
const limit = params.limit as number;
|
||||
return `Value at ${path} is too small: must be >= ${limit}`;
|
||||
}
|
||||
|
||||
case 'maximum': {
|
||||
const limit = params.limit as number;
|
||||
return `Value at ${path} is too large: must be <= ${limit}`;
|
||||
}
|
||||
|
||||
case 'minItems': {
|
||||
const limit = params.limit as number;
|
||||
return `Array at ${path} has too few items: must have at least ${limit} item(s)`;
|
||||
}
|
||||
|
||||
case 'maxItems': {
|
||||
const limit = params.limit as number;
|
||||
return `Array at ${path} has too many items: must have at most ${limit} item(s)`;
|
||||
}
|
||||
|
||||
case 'pattern': {
|
||||
const pattern = params.pattern as string;
|
||||
return `Value at ${path} does not match required pattern: ${pattern}`;
|
||||
}
|
||||
|
||||
case 'format': {
|
||||
const format = params.format as string;
|
||||
return `Value at ${path} must be a valid ${format}`;
|
||||
}
|
||||
|
||||
case 'const': {
|
||||
const allowedValue = params.allowedValue as unknown;
|
||||
return `Value at ${path} must be exactly "${allowedValue}"`;
|
||||
}
|
||||
|
||||
case 'oneOf': {
|
||||
return `Value at ${path} must match exactly one schema (matched ${params.passingSchemas ?? 0})`;
|
||||
}
|
||||
|
||||
case 'anyOf': {
|
||||
return `Value at ${path} must match at least one of the allowed schemas`;
|
||||
}
|
||||
|
||||
case 'not': {
|
||||
return `Value at ${path} matches a schema it should not match`;
|
||||
}
|
||||
|
||||
case 'if': {
|
||||
return `Value at ${path} does not satisfy conditional schema requirements`;
|
||||
}
|
||||
|
||||
case 'uniqueItems': {
|
||||
const i = params.i as number;
|
||||
const j = params.j as number;
|
||||
return `Array at ${path} contains duplicate items at positions ${j} and ${i}`;
|
||||
}
|
||||
|
||||
case 'propertyNames': {
|
||||
const propertyName = params.propertyName as string;
|
||||
return `Invalid property name at ${path}: "${propertyName}" does not match naming requirements`;
|
||||
}
|
||||
|
||||
case 'dependencies':
|
||||
case 'dependentRequired': {
|
||||
const property = params.property as string;
|
||||
const missingProperty = params.missingProperty as string;
|
||||
return `Missing dependent field at ${path}: "${missingProperty}" is required when "${property}" is present`;
|
||||
}
|
||||
|
||||
default: {
|
||||
// Fallback for any unhandled keywords - use AJV's message if available
|
||||
const message = error.message || `validation failed for keyword "${error.keyword}"`;
|
||||
return `${path}: ${message}`;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Format all AJV errors into a list of human-readable messages.
|
||||
* Returns an array of formatted error strings.
|
||||
*/
|
||||
function formatAjvErrors(errors: ErrorObject[]): string[] {
|
||||
return errors.map(formatAjvError);
|
||||
}
|
||||
|
||||
export const parseConfig = async (configPath: string): Promise<Config> => {
|
||||
try {
|
||||
// 1. Verify file exists
|
||||
if (!(await fs.pathExists(configPath))) {
|
||||
throw new PentestError(
|
||||
`Configuration file not found: ${configPath}`,
|
||||
'config',
|
||||
false,
|
||||
{ configPath },
|
||||
ErrorCode.CONFIG_NOT_FOUND,
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Check file size
|
||||
const stats = await fs.stat(configPath);
|
||||
const maxFileSize = 1024 * 1024; // 1MB
|
||||
if (stats.size > maxFileSize) {
|
||||
throw new PentestError(
|
||||
`Configuration file too large: ${stats.size} bytes (maximum: ${maxFileSize} bytes)`,
|
||||
'config',
|
||||
false,
|
||||
{ configPath, fileSize: stats.size, maxFileSize },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
|
||||
// 3. Read and check for empty content
|
||||
const configContent = await fs.readFile(configPath, 'utf8');
|
||||
|
||||
if (!configContent.trim()) {
|
||||
throw new PentestError(
|
||||
'Configuration file is empty',
|
||||
'config',
|
||||
false,
|
||||
{ configPath },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
|
||||
// 4. Parse YAML with safe schema
|
||||
let config: unknown;
|
||||
try {
|
||||
config = yaml.load(configContent, {
|
||||
schema: yaml.FAILSAFE_SCHEMA, // Only basic YAML types, no JS evaluation
|
||||
json: false, // Don't allow JSON-specific syntax
|
||||
filename: configPath,
|
||||
});
|
||||
} catch (yamlError) {
|
||||
const errMsg = yamlError instanceof Error ? yamlError.message : String(yamlError);
|
||||
throw new PentestError(
|
||||
`YAML parsing failed: ${errMsg}`,
|
||||
'config',
|
||||
false,
|
||||
{ configPath, originalError: errMsg },
|
||||
ErrorCode.CONFIG_PARSE_ERROR,
|
||||
);
|
||||
}
|
||||
|
||||
// 5. Guard against null/undefined parse result
|
||||
if (config === null || config === undefined) {
|
||||
throw new PentestError(
|
||||
'Configuration file resulted in null/undefined after parsing',
|
||||
'config',
|
||||
false,
|
||||
{ configPath },
|
||||
ErrorCode.CONFIG_PARSE_ERROR,
|
||||
);
|
||||
}
|
||||
|
||||
// 6. Validate schema, security rules, and return
|
||||
validateConfig(config as Config);
|
||||
|
||||
return config as Config;
|
||||
} catch (error) {
|
||||
// PentestError instances are already well-formatted, re-throw as-is
|
||||
if (error instanceof PentestError) {
|
||||
throw error;
|
||||
}
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
throw new PentestError(
|
||||
`Failed to parse configuration file '${configPath}': ${errMsg}`,
|
||||
'config',
|
||||
false,
|
||||
{ configPath, originalError: errMsg },
|
||||
ErrorCode.CONFIG_PARSE_ERROR,
|
||||
);
|
||||
}
|
||||
};
|
||||
|
||||
const validateConfig = (config: Config): void => {
|
||||
if (!config || typeof config !== 'object') {
|
||||
throw new PentestError(
|
||||
'Configuration must be a valid object',
|
||||
'config',
|
||||
false,
|
||||
{},
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
|
||||
if (Array.isArray(config)) {
|
||||
throw new PentestError(
|
||||
'Configuration must be an object, not an array',
|
||||
'config',
|
||||
false,
|
||||
{},
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
|
||||
const isValid = validateSchema(config);
|
||||
if (!isValid) {
|
||||
const errors = validateSchema.errors || [];
|
||||
const errorMessages = formatAjvErrors(errors);
|
||||
throw new PentestError(
|
||||
`Configuration validation failed:\n - ${errorMessages.join('\n - ')}`,
|
||||
'config',
|
||||
false,
|
||||
{ validationErrors: errorMessages },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
|
||||
performSecurityValidation(config);
|
||||
|
||||
if (!config.rules && !config.authentication && !config.description) {
|
||||
console.warn(
|
||||
'⚠️ Configuration file contains no rules, authentication, or description. The pentest will run without any scoping restrictions or login capabilities.',
|
||||
);
|
||||
} else if (config.rules && !config.rules.avoid && !config.rules.focus) {
|
||||
console.warn('⚠️ Configuration file contains no rules. The pentest will run without any scoping restrictions.');
|
||||
}
|
||||
};
|
||||
|
||||
const performSecurityValidation = (config: Config): void => {
|
||||
if (config.authentication) {
|
||||
const auth = config.authentication;
|
||||
|
||||
// Check login_url for dangerous patterns (AJV's "uri" format allows javascript: per RFC 3986)
|
||||
if (auth.login_url) {
|
||||
for (const pattern of DANGEROUS_PATTERNS) {
|
||||
if (pattern.test(auth.login_url)) {
|
||||
throw new PentestError(
|
||||
`authentication.login_url contains potentially dangerous pattern: ${pattern.source}`,
|
||||
'config',
|
||||
false,
|
||||
{ field: 'login_url', pattern: pattern.source },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (auth.credentials) {
|
||||
for (const pattern of DANGEROUS_PATTERNS) {
|
||||
if (pattern.test(auth.credentials.username)) {
|
||||
throw new PentestError(
|
||||
`authentication.credentials.username contains potentially dangerous pattern: ${pattern.source}`,
|
||||
'config',
|
||||
false,
|
||||
{ field: 'credentials.username', pattern: pattern.source },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
if (pattern.test(auth.credentials.password)) {
|
||||
throw new PentestError(
|
||||
`authentication.credentials.password contains potentially dangerous pattern: ${pattern.source}`,
|
||||
'config',
|
||||
false,
|
||||
{ field: 'credentials.password', pattern: pattern.source },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (auth.login_flow) {
|
||||
auth.login_flow.forEach((step, index) => {
|
||||
for (const pattern of DANGEROUS_PATTERNS) {
|
||||
if (pattern.test(step)) {
|
||||
throw new PentestError(
|
||||
`authentication.login_flow[${index}] contains potentially dangerous pattern: ${pattern.source}`,
|
||||
'config',
|
||||
false,
|
||||
{ field: `login_flow[${index}]`, pattern: pattern.source },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
if (config.rules) {
|
||||
validateRulesSecurity(config.rules.avoid, 'avoid');
|
||||
validateRulesSecurity(config.rules.focus, 'focus');
|
||||
|
||||
checkForDuplicates(config.rules.avoid || [], 'avoid');
|
||||
checkForDuplicates(config.rules.focus || [], 'focus');
|
||||
checkForConflicts(config.rules.avoid, config.rules.focus);
|
||||
}
|
||||
|
||||
if (config.description) {
|
||||
for (const pattern of DANGEROUS_PATTERNS) {
|
||||
if (pattern.test(config.description)) {
|
||||
throw new PentestError(
|
||||
`description contains potentially dangerous pattern: ${pattern.source}`,
|
||||
'config',
|
||||
false,
|
||||
{ field: 'description', pattern: pattern.source },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
const validateRulesSecurity = (rules: Rule[] | undefined, ruleType: string): void => {
|
||||
if (!rules) return;
|
||||
|
||||
rules.forEach((rule, index) => {
|
||||
for (const pattern of DANGEROUS_PATTERNS) {
|
||||
if (pattern.test(rule.url_path)) {
|
||||
throw new PentestError(
|
||||
`rules.${ruleType}[${index}].url_path contains potentially dangerous pattern: ${pattern.source}`,
|
||||
'config',
|
||||
false,
|
||||
{ field: `rules.${ruleType}[${index}].url_path`, pattern: pattern.source },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
if (pattern.test(rule.description)) {
|
||||
throw new PentestError(
|
||||
`rules.${ruleType}[${index}].description contains potentially dangerous pattern: ${pattern.source}`,
|
||||
'config',
|
||||
false,
|
||||
{ field: `rules.${ruleType}[${index}].description`, pattern: pattern.source },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
validateRuleTypeSpecific(rule, ruleType, index);
|
||||
});
|
||||
};
|
||||
|
||||
const validateRuleTypeSpecific = (rule: Rule, ruleType: string, index: number): void => {
|
||||
const field = `rules.${ruleType}[${index}].url_path`;
|
||||
|
||||
switch (rule.type) {
|
||||
case 'path':
|
||||
if (!rule.url_path.startsWith('/')) {
|
||||
throw new PentestError(
|
||||
`${field} for type 'path' must start with '/'`,
|
||||
'config',
|
||||
false,
|
||||
{ field, ruleType: rule.type },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
break;
|
||||
|
||||
case 'subdomain':
|
||||
case 'domain':
|
||||
// Basic domain validation - no slashes allowed
|
||||
if (rule.url_path.includes('/')) {
|
||||
throw new PentestError(
|
||||
`${field} for type '${rule.type}' cannot contain '/' characters`,
|
||||
'config',
|
||||
false,
|
||||
{ field, ruleType: rule.type },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
// Must contain at least one dot for domains
|
||||
if (rule.type === 'domain' && !rule.url_path.includes('.')) {
|
||||
throw new PentestError(
|
||||
`${field} for type 'domain' must be a valid domain name`,
|
||||
'config',
|
||||
false,
|
||||
{ field, ruleType: rule.type },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
break;
|
||||
|
||||
case 'method': {
|
||||
const allowedMethods = ['GET', 'POST', 'PUT', 'DELETE', 'PATCH', 'HEAD', 'OPTIONS'];
|
||||
if (!allowedMethods.includes(rule.url_path.toUpperCase())) {
|
||||
throw new PentestError(
|
||||
`${field} for type 'method' must be one of: ${allowedMethods.join(', ')}`,
|
||||
'config',
|
||||
false,
|
||||
{ field, ruleType: rule.type, allowedMethods },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
case 'header':
|
||||
if (!rule.url_path.match(/^[a-zA-Z0-9\-_]+$/)) {
|
||||
throw new PentestError(
|
||||
`${field} for type 'header' must be a valid header name (alphanumeric, hyphens, underscores only)`,
|
||||
'config',
|
||||
false,
|
||||
{ field, ruleType: rule.type },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
break;
|
||||
|
||||
case 'parameter':
|
||||
if (!rule.url_path.match(/^[a-zA-Z0-9\-_]+$/)) {
|
||||
throw new PentestError(
|
||||
`${field} for type 'parameter' must be a valid parameter name (alphanumeric, hyphens, underscores only)`,
|
||||
'config',
|
||||
false,
|
||||
{ field, ruleType: rule.type },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
break;
|
||||
}
|
||||
};
|
||||
|
||||
const checkForDuplicates = (rules: Rule[], ruleType: string): void => {
|
||||
const seen = new Set<string>();
|
||||
rules.forEach((rule, index) => {
|
||||
const key = `${rule.type}:${rule.url_path}`;
|
||||
if (seen.has(key)) {
|
||||
throw new PentestError(
|
||||
`Duplicate rule found in rules.${ruleType}[${index}]: ${rule.type} '${rule.url_path}'`,
|
||||
'config',
|
||||
false,
|
||||
{ field: `rules.${ruleType}[${index}]`, ruleType: rule.type, urlPath: rule.url_path },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
seen.add(key);
|
||||
});
|
||||
};
|
||||
|
||||
const checkForConflicts = (avoidRules: Rule[] = [], focusRules: Rule[] = []): void => {
|
||||
const avoidSet = new Set(avoidRules.map((rule) => `${rule.type}:${rule.url_path}`));
|
||||
|
||||
focusRules.forEach((rule, index) => {
|
||||
const key = `${rule.type}:${rule.url_path}`;
|
||||
if (avoidSet.has(key)) {
|
||||
throw new PentestError(
|
||||
`Conflicting rule found: rules.focus[${index}] '${rule.url_path}' also exists in rules.avoid`,
|
||||
'config',
|
||||
false,
|
||||
{ field: `rules.focus[${index}]`, urlPath: rule.url_path },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
);
|
||||
}
|
||||
});
|
||||
};
|
||||
|
||||
const sanitizeRule = (rule: Rule): Rule => {
|
||||
return {
|
||||
description: rule.description.trim(),
|
||||
type: rule.type.toLowerCase().trim() as Rule['type'],
|
||||
url_path: rule.url_path.trim(),
|
||||
};
|
||||
};
|
||||
|
||||
export const distributeConfig = (config: Config | null): DistributedConfig => {
|
||||
const avoid = config?.rules?.avoid || [];
|
||||
const focus = config?.rules?.focus || [];
|
||||
const authentication = config?.authentication || null;
|
||||
const description = config?.description?.trim() || '';
|
||||
|
||||
return {
|
||||
avoid: avoid.map(sanitizeRule),
|
||||
focus: focus.map(sanitizeRule),
|
||||
authentication: authentication ? sanitizeAuthentication(authentication) : null,
|
||||
description,
|
||||
};
|
||||
};
|
||||
|
||||
const sanitizeAuthentication = (auth: Authentication): Authentication => {
|
||||
return {
|
||||
login_type: auth.login_type.toLowerCase().trim() as Authentication['login_type'],
|
||||
login_url: auth.login_url.trim(),
|
||||
credentials: {
|
||||
username: auth.credentials.username.trim(),
|
||||
password: auth.credentials.password,
|
||||
...(auth.credentials.totp_secret && { totp_secret: auth.credentials.totp_secret.trim() }),
|
||||
},
|
||||
...(auth.login_flow && { login_flow: auth.login_flow.map((step) => step.trim()) }),
|
||||
success_condition: {
|
||||
type: auth.success_condition.type.toLowerCase().trim() as Authentication['success_condition']['type'],
|
||||
value: auth.success_condition.value.trim(),
|
||||
},
|
||||
};
|
||||
};
|
||||
@@ -0,0 +1,30 @@
|
||||
/** Centralized path constants for the worker package */
|
||||
|
||||
import fs from 'node:fs';
|
||||
import path from 'node:path';
|
||||
|
||||
/** Worker package root (apps/worker/) resolved from compiled dist/ files */
|
||||
const WORKER_ROOT = path.resolve(import.meta.dirname, '..');
|
||||
|
||||
export const PROMPTS_DIR = path.join(WORKER_ROOT, 'prompts');
|
||||
export const CONFIGS_DIR = path.join(WORKER_ROOT, 'configs');
|
||||
|
||||
/**
|
||||
* Repository root — walk up from WORKER_ROOT looking for pnpm-workspace.yaml.
|
||||
* Falls back to two levels up (apps/worker/ → repo root) if not found.
|
||||
*/
|
||||
function findRepoRoot(): string {
|
||||
let dir = WORKER_ROOT;
|
||||
for (let i = 0; i < 5; i++) {
|
||||
if (fs.existsSync(path.join(dir, 'pnpm-workspace.yaml'))) {
|
||||
return dir;
|
||||
}
|
||||
const parent = path.dirname(dir);
|
||||
if (parent === dir) break;
|
||||
dir = parent;
|
||||
}
|
||||
return path.resolve(WORKER_ROOT, '..', '..');
|
||||
}
|
||||
|
||||
const REPO_ROOT = findRepoRoot();
|
||||
export const WORKSPACES_DIR = path.join(REPO_ROOT, 'workspaces');
|
||||
@@ -0,0 +1,48 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
export class ProgressIndicator {
|
||||
private message: string;
|
||||
private frames: string[] = ['⠋', '⠙', '⠹', '⠸', '⠼', '⠴', '⠦', '⠧', '⠇', '⠏'];
|
||||
private frameIndex: number = 0;
|
||||
private interval: ReturnType<typeof setInterval> | null = null;
|
||||
private isRunning: boolean = false;
|
||||
|
||||
constructor(message: string = 'Working...') {
|
||||
this.message = message;
|
||||
}
|
||||
|
||||
start(): void {
|
||||
if (this.isRunning) return;
|
||||
|
||||
this.isRunning = true;
|
||||
this.frameIndex = 0;
|
||||
|
||||
this.interval = setInterval(() => {
|
||||
// Clear the line and write the spinner
|
||||
process.stdout.write(`\r${this.frames[this.frameIndex]} ${this.message}`);
|
||||
this.frameIndex = (this.frameIndex + 1) % this.frames.length;
|
||||
}, 100);
|
||||
}
|
||||
|
||||
stop(): void {
|
||||
if (!this.isRunning) return;
|
||||
|
||||
if (this.interval) {
|
||||
clearInterval(this.interval);
|
||||
this.interval = null;
|
||||
}
|
||||
|
||||
// Clear the spinner line
|
||||
process.stdout.write(`\r${' '.repeat(this.message.length + 5)}\r`);
|
||||
this.isRunning = false;
|
||||
}
|
||||
|
||||
finish(successMessage: string = 'Complete'): void {
|
||||
this.stop();
|
||||
console.log(`✓ ${successMessage}`);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,137 @@
|
||||
#!/usr/bin/env node
|
||||
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* generate-totp CLI
|
||||
*
|
||||
* Generates 6-digit TOTP codes for authentication.
|
||||
* Replaces the MCP generate_totp tool.
|
||||
* Based on RFC 6238 (TOTP) and RFC 4226 (HOTP).
|
||||
*
|
||||
* Usage:
|
||||
* generate-totp --secret JBSWY3DPEHPK3PXP
|
||||
*/
|
||||
|
||||
import { createHmac } from 'node:crypto';
|
||||
|
||||
// === Base32 Decoding ===
|
||||
|
||||
function base32Decode(encoded: string): Buffer {
|
||||
const alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ234567';
|
||||
const cleanInput = encoded.toUpperCase().replace(/[^A-Z2-7]/g, '');
|
||||
|
||||
if (cleanInput.length === 0) {
|
||||
throw new Error('TOTP secret is empty after cleaning');
|
||||
}
|
||||
|
||||
const output: number[] = [];
|
||||
let bits = 0;
|
||||
let value = 0;
|
||||
|
||||
for (const char of cleanInput) {
|
||||
const index = alphabet.indexOf(char);
|
||||
if (index === -1) {
|
||||
throw new Error(`Invalid base32 character: ${char}`);
|
||||
}
|
||||
|
||||
value = (value << 5) | index;
|
||||
bits += 5;
|
||||
|
||||
if (bits >= 8) {
|
||||
output.push((value >>> (bits - 8)) & 255);
|
||||
bits -= 8;
|
||||
}
|
||||
}
|
||||
|
||||
return Buffer.from(output);
|
||||
}
|
||||
|
||||
// === TOTP Generation (RFC 6238) ===
|
||||
|
||||
function generateHOTP(secret: string, counter: number, digits: number = 6): string {
|
||||
const key = base32Decode(secret);
|
||||
|
||||
// Convert counter to 8-byte buffer (big-endian)
|
||||
const counterBuffer = Buffer.alloc(8);
|
||||
counterBuffer.writeBigUInt64BE(BigInt(counter));
|
||||
|
||||
// Generate HMAC-SHA1
|
||||
const hmac = createHmac('sha1', key);
|
||||
hmac.update(counterBuffer);
|
||||
const hash = hmac.digest();
|
||||
|
||||
// Dynamic truncation (SHA-1 always produces 20 bytes)
|
||||
const lastByte = hash[hash.length - 1] ?? 0;
|
||||
const offset = lastByte & 0x0f;
|
||||
const code =
|
||||
(((hash[offset] ?? 0) & 0x7f) << 24) |
|
||||
(((hash[offset + 1] ?? 0) & 0xff) << 16) |
|
||||
(((hash[offset + 2] ?? 0) & 0xff) << 8) |
|
||||
((hash[offset + 3] ?? 0) & 0xff);
|
||||
|
||||
return (code % 10 ** digits).toString().padStart(digits, '0');
|
||||
}
|
||||
|
||||
function generateTOTP(secret: string, timeStep: number = 30, digits: number = 6): string {
|
||||
const counter = Math.floor(Date.now() / 1000 / timeStep);
|
||||
return generateHOTP(secret, counter, digits);
|
||||
}
|
||||
|
||||
// === Argument Parsing ===
|
||||
|
||||
function parseSecret(argv: string[]): string {
|
||||
for (let i = 2; i < argv.length; i++) {
|
||||
const next = argv[i + 1];
|
||||
if (argv[i] === '--secret' && next) {
|
||||
return next;
|
||||
}
|
||||
}
|
||||
return '';
|
||||
}
|
||||
|
||||
// === Main ===
|
||||
|
||||
function main(): void {
|
||||
const secret = parseSecret(process.argv);
|
||||
|
||||
if (!secret) {
|
||||
console.log(JSON.stringify({ status: 'error', message: 'Missing required --secret argument', retryable: false }));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const base32Regex = /^[A-Z2-7]+$/i;
|
||||
if (!base32Regex.test(secret)) {
|
||||
console.log(
|
||||
JSON.stringify({
|
||||
status: 'error',
|
||||
message: 'Secret must be base32-encoded (characters A-Z and 2-7)',
|
||||
retryable: false,
|
||||
}),
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
try {
|
||||
const totpCode = generateTOTP(secret);
|
||||
const expiresIn = 30 - (Math.floor(Date.now() / 1000) % 30);
|
||||
|
||||
console.log(
|
||||
JSON.stringify({
|
||||
status: 'success',
|
||||
totpCode,
|
||||
expiresIn,
|
||||
}),
|
||||
);
|
||||
} catch (error) {
|
||||
const msg = error instanceof Error ? error.message : String(error);
|
||||
console.log(JSON.stringify({ status: 'error', message: `TOTP generation failed: ${msg}`, retryable: false }));
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -0,0 +1,191 @@
|
||||
#!/usr/bin/env node
|
||||
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* save-deliverable CLI
|
||||
*
|
||||
* Standalone script to save deliverable files with validation.
|
||||
* Replaces the MCP save_deliverable tool.
|
||||
*
|
||||
* Usage:
|
||||
* node save-deliverable.js --type INJECTION_QUEUE --content '{"vulnerabilities": [...]}'
|
||||
* node save-deliverable.js --type INJECTION_ANALYSIS --file-path deliverables/injection_analysis_deliverable.md
|
||||
*/
|
||||
|
||||
import { mkdirSync, readFileSync, writeFileSync } from 'node:fs';
|
||||
import { join, resolve } from 'node:path';
|
||||
import { DELIVERABLE_FILENAMES, type DeliverableType, isQueueType } from '../types/deliverables.js';
|
||||
|
||||
// === Argument Parsing ===
|
||||
|
||||
interface ParsedArgs {
|
||||
type: string;
|
||||
content?: string;
|
||||
filePath?: string;
|
||||
}
|
||||
|
||||
function parseArgs(argv: string[]): ParsedArgs {
|
||||
const args: ParsedArgs = { type: '' };
|
||||
|
||||
for (let i = 2; i < argv.length; i++) {
|
||||
const arg = argv[i];
|
||||
const next = argv[i + 1];
|
||||
|
||||
if (arg === '--type' && next) {
|
||||
args.type = next;
|
||||
i++;
|
||||
} else if (arg === '--content' && next) {
|
||||
args.content = next;
|
||||
i++;
|
||||
} else if (arg === '--file-path' && next) {
|
||||
args.filePath = next;
|
||||
i++;
|
||||
}
|
||||
}
|
||||
|
||||
return args;
|
||||
}
|
||||
|
||||
// === Queue Validation ===
|
||||
|
||||
interface ValidationResult {
|
||||
valid: boolean;
|
||||
message?: string;
|
||||
}
|
||||
|
||||
function validateQueueJson(content: string): ValidationResult {
|
||||
try {
|
||||
const parsed = JSON.parse(content) as unknown;
|
||||
|
||||
if (typeof parsed !== 'object' || parsed === null) {
|
||||
return {
|
||||
valid: false,
|
||||
message: `Invalid queue structure: Expected an object. Got: ${typeof parsed}`,
|
||||
};
|
||||
}
|
||||
|
||||
const obj = parsed as Record<string, unknown>;
|
||||
|
||||
if (!('vulnerabilities' in obj)) {
|
||||
return {
|
||||
valid: false,
|
||||
message: `Invalid queue structure: Missing 'vulnerabilities' property. Expected: {"vulnerabilities": [...]}`,
|
||||
};
|
||||
}
|
||||
|
||||
if (!Array.isArray(obj.vulnerabilities)) {
|
||||
return {
|
||||
valid: false,
|
||||
message: `Invalid queue structure: 'vulnerabilities' must be an array. Expected: {"vulnerabilities": [...]}`,
|
||||
};
|
||||
}
|
||||
|
||||
return { valid: true };
|
||||
} catch (error) {
|
||||
return {
|
||||
valid: false,
|
||||
message: `Invalid JSON: ${error instanceof Error ? error.message : String(error)}`,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// === File Operations ===
|
||||
|
||||
function saveDeliverableFile(targetDir: string, filename: string, content: string): string {
|
||||
const deliverablesDir = join(targetDir, 'deliverables');
|
||||
const filepath = join(deliverablesDir, filename);
|
||||
|
||||
try {
|
||||
mkdirSync(deliverablesDir, { recursive: true });
|
||||
} catch {
|
||||
throw new Error(`Cannot create deliverables directory at ${deliverablesDir}`);
|
||||
}
|
||||
|
||||
writeFileSync(filepath, content, 'utf8');
|
||||
return filepath;
|
||||
}
|
||||
|
||||
// === Main ===
|
||||
|
||||
function main(): void {
|
||||
const args = parseArgs(process.argv);
|
||||
|
||||
// 1. Validate --type
|
||||
if (!args.type) {
|
||||
console.log(JSON.stringify({ status: 'error', message: 'Missing required --type argument', retryable: false }));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const deliverableType = args.type as DeliverableType;
|
||||
const filename = DELIVERABLE_FILENAMES[deliverableType];
|
||||
|
||||
if (!filename) {
|
||||
console.log(
|
||||
JSON.stringify({ status: 'error', message: `Unknown deliverable type: ${args.type}`, retryable: false }),
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// 2. Resolve content from --content or --file-path
|
||||
let content: string;
|
||||
|
||||
if (args.content) {
|
||||
content = args.content;
|
||||
} else if (args.filePath) {
|
||||
// Path traversal protection: must resolve inside cwd
|
||||
const cwd = process.cwd();
|
||||
const resolved = resolve(cwd, args.filePath);
|
||||
if (!resolved.startsWith(`${cwd}/`) && resolved !== cwd) {
|
||||
console.log(
|
||||
JSON.stringify({ status: 'error', message: `Path traversal detected: ${args.filePath}`, retryable: false }),
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
try {
|
||||
content = readFileSync(resolved, 'utf8');
|
||||
} catch (error) {
|
||||
const msg = error instanceof Error ? error.message : String(error);
|
||||
console.log(JSON.stringify({ status: 'error', message: `Failed to read file: ${msg}`, retryable: true }));
|
||||
process.exit(1);
|
||||
}
|
||||
} else {
|
||||
console.log(
|
||||
JSON.stringify({
|
||||
status: 'error',
|
||||
message: 'Either --content or --file-path is required',
|
||||
retryable: false,
|
||||
}),
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// 3. Validate queue types
|
||||
let validated = false;
|
||||
if (isQueueType(args.type)) {
|
||||
const validation = validateQueueJson(content);
|
||||
if (!validation.valid) {
|
||||
console.log(JSON.stringify({ status: 'error', message: validation.message, retryable: true }));
|
||||
process.exit(1);
|
||||
}
|
||||
validated = true;
|
||||
}
|
||||
|
||||
// 4. Save the file
|
||||
try {
|
||||
const targetDir = process.cwd();
|
||||
const filepath = saveDeliverableFile(targetDir, filename, content);
|
||||
console.log(JSON.stringify({ status: 'success', filepath, validated }));
|
||||
} catch (error) {
|
||||
const msg = error instanceof Error ? error.message : String(error);
|
||||
console.log(JSON.stringify({ status: 'error', message: `Failed to save: ${msg}`, retryable: true }));
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
main();
|
||||
@@ -0,0 +1,272 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Agent Execution Service
|
||||
*
|
||||
* Handles the full agent lifecycle:
|
||||
* - Load config via ConfigLoaderService
|
||||
* - Load prompt template using AGENTS[agentName].promptTemplate
|
||||
* - Create git checkpoint
|
||||
* - Start audit logging
|
||||
* - Invoke Claude SDK via runClaudePrompt
|
||||
* - Spending cap check using isSpendingCapBehavior
|
||||
* - Handle failure (rollback, audit)
|
||||
* - Validate output using AGENTS[agentName].deliverableFilename
|
||||
* - Commit on success, log metrics
|
||||
*
|
||||
* No Temporal dependencies - pure domain logic.
|
||||
*/
|
||||
|
||||
import { type ClaudePromptResult, runClaudePrompt, validateAgentOutput } from '../ai/claude-executor.js';
|
||||
import type { AuditSession } from '../audit/index.js';
|
||||
import { AGENTS } from '../session-manager.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import type { AgentName } from '../types/agents.js';
|
||||
import type { AgentEndResult } from '../types/audit.js';
|
||||
import { ErrorCode, type PentestErrorType } from '../types/errors.js';
|
||||
import type { AgentMetrics } from '../types/metrics.js';
|
||||
import { err, isErr, ok, type Result } from '../types/result.js';
|
||||
import { isSpendingCapBehavior } from '../utils/billing-detection.js';
|
||||
import type { ConfigLoaderService } from './config-loader.js';
|
||||
import { PentestError } from './error-handling.js';
|
||||
import { commitGitSuccess, createGitCheckpoint, getGitCommitHash, rollbackGitWorkspace } from './git-manager.js';
|
||||
import { loadPrompt } from './prompt-manager.js';
|
||||
|
||||
/**
|
||||
* Input for agent execution.
|
||||
*/
|
||||
export interface AgentExecutionInput {
|
||||
webUrl: string;
|
||||
repoPath: string;
|
||||
configPath?: string | undefined;
|
||||
pipelineTestingMode?: boolean | undefined;
|
||||
attemptNumber: number;
|
||||
}
|
||||
|
||||
interface FailAgentOpts {
|
||||
attemptNumber: number;
|
||||
result: ClaudePromptResult;
|
||||
rollbackReason: string;
|
||||
errorMessage: string;
|
||||
errorCode: ErrorCode;
|
||||
category: PentestErrorType;
|
||||
retryable: boolean;
|
||||
context: Record<string, unknown>;
|
||||
}
|
||||
|
||||
/**
|
||||
* Service for executing agents with full lifecycle management.
|
||||
*
|
||||
* NOTE: AuditSession is passed per-execution, NOT stored on the service.
|
||||
* This is critical for parallel agent execution - each agent needs its own
|
||||
* AuditSession instance because AuditSession uses instance state (currentAgentName)
|
||||
* to track which agent is currently logging.
|
||||
*/
|
||||
export class AgentExecutionService {
|
||||
private readonly configLoader: ConfigLoaderService;
|
||||
|
||||
constructor(configLoader: ConfigLoaderService) {
|
||||
this.configLoader = configLoader;
|
||||
}
|
||||
|
||||
/**
|
||||
* Execute an agent with full lifecycle management.
|
||||
*
|
||||
* @param agentName - Name of the agent to execute
|
||||
* @param input - Execution input parameters
|
||||
* @param auditSession - Audit session for this specific agent execution
|
||||
* @returns Result containing AgentEndResult on success, PentestError on failure
|
||||
*/
|
||||
async execute(
|
||||
agentName: AgentName,
|
||||
input: AgentExecutionInput,
|
||||
auditSession: AuditSession,
|
||||
logger: ActivityLogger,
|
||||
): Promise<Result<AgentEndResult, PentestError>> {
|
||||
const { webUrl, repoPath, configPath, pipelineTestingMode = false, attemptNumber } = input;
|
||||
|
||||
// 1. Load config (if provided)
|
||||
const configResult = await this.configLoader.loadOptional(configPath);
|
||||
if (isErr(configResult)) {
|
||||
return configResult;
|
||||
}
|
||||
const distributedConfig = configResult.value;
|
||||
|
||||
// 2. Load prompt
|
||||
const promptTemplate = AGENTS[agentName].promptTemplate;
|
||||
let prompt: string;
|
||||
try {
|
||||
prompt = await loadPrompt(promptTemplate, { webUrl, repoPath }, distributedConfig, pipelineTestingMode, logger);
|
||||
} catch (error) {
|
||||
const errorMessage = error instanceof Error ? error.message : String(error);
|
||||
return err(
|
||||
new PentestError(
|
||||
`Failed to load prompt for ${agentName}: ${errorMessage}`,
|
||||
'prompt',
|
||||
false,
|
||||
{ agentName, promptTemplate, originalError: errorMessage },
|
||||
ErrorCode.PROMPT_LOAD_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// 3. Create git checkpoint before execution
|
||||
try {
|
||||
await createGitCheckpoint(repoPath, agentName, attemptNumber, logger);
|
||||
} catch (error) {
|
||||
const errorMessage = error instanceof Error ? error.message : String(error);
|
||||
return err(
|
||||
new PentestError(
|
||||
`Failed to create git checkpoint for ${agentName}: ${errorMessage}`,
|
||||
'filesystem',
|
||||
false,
|
||||
{ agentName, repoPath, originalError: errorMessage },
|
||||
ErrorCode.GIT_CHECKPOINT_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// 4. Start audit logging
|
||||
await auditSession.startAgent(agentName, prompt, attemptNumber);
|
||||
|
||||
// 5. Execute agent
|
||||
const result: ClaudePromptResult = await runClaudePrompt(
|
||||
prompt,
|
||||
repoPath,
|
||||
'', // context
|
||||
agentName, // description
|
||||
agentName,
|
||||
auditSession,
|
||||
logger,
|
||||
AGENTS[agentName].modelTier,
|
||||
);
|
||||
|
||||
// 6. Spending cap check - defense-in-depth
|
||||
if (result.success && (result.turns ?? 0) <= 2 && (result.cost || 0) === 0) {
|
||||
const resultText = result.result || '';
|
||||
if (isSpendingCapBehavior(result.turns ?? 0, result.cost || 0, resultText)) {
|
||||
return this.failAgent(agentName, repoPath, auditSession, logger, {
|
||||
attemptNumber,
|
||||
result,
|
||||
rollbackReason: 'spending cap detected',
|
||||
errorMessage: `Spending cap likely reached: ${resultText.slice(0, 100)}`,
|
||||
errorCode: ErrorCode.SPENDING_CAP_REACHED,
|
||||
category: 'billing',
|
||||
retryable: true,
|
||||
context: { agentName, turns: result.turns, cost: result.cost },
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// 7. Handle execution failure
|
||||
if (!result.success) {
|
||||
return this.failAgent(agentName, repoPath, auditSession, logger, {
|
||||
attemptNumber,
|
||||
result,
|
||||
rollbackReason: 'execution failure',
|
||||
errorMessage: result.error || 'Agent execution failed',
|
||||
errorCode: ErrorCode.AGENT_EXECUTION_FAILED,
|
||||
category: 'validation',
|
||||
retryable: result.retryable ?? true,
|
||||
context: { agentName, originalError: result.error },
|
||||
});
|
||||
}
|
||||
|
||||
// 8. Validate output
|
||||
const validationPassed = await validateAgentOutput(result, agentName, repoPath, logger);
|
||||
if (!validationPassed) {
|
||||
return this.failAgent(agentName, repoPath, auditSession, logger, {
|
||||
attemptNumber,
|
||||
result,
|
||||
rollbackReason: 'validation failure',
|
||||
errorMessage: `Agent ${agentName} failed output validation`,
|
||||
errorCode: ErrorCode.OUTPUT_VALIDATION_FAILED,
|
||||
category: 'validation',
|
||||
retryable: true,
|
||||
context: { agentName, deliverableFilename: AGENTS[agentName].deliverableFilename },
|
||||
});
|
||||
}
|
||||
|
||||
// 9. Success - commit deliverables, then capture checkpoint hash
|
||||
await commitGitSuccess(repoPath, agentName, logger);
|
||||
const commitHash = await getGitCommitHash(repoPath);
|
||||
|
||||
const endResult: AgentEndResult = {
|
||||
attemptNumber,
|
||||
duration_ms: result.duration,
|
||||
cost_usd: result.cost || 0,
|
||||
success: true,
|
||||
model: result.model,
|
||||
...(commitHash && { checkpoint: commitHash }),
|
||||
};
|
||||
await auditSession.endAgent(agentName, endResult);
|
||||
|
||||
return ok(endResult);
|
||||
}
|
||||
|
||||
private async failAgent(
|
||||
agentName: AgentName,
|
||||
repoPath: string,
|
||||
auditSession: AuditSession,
|
||||
logger: ActivityLogger,
|
||||
opts: FailAgentOpts,
|
||||
): Promise<Result<AgentEndResult, PentestError>> {
|
||||
await rollbackGitWorkspace(repoPath, opts.rollbackReason, logger);
|
||||
|
||||
const endResult: AgentEndResult = {
|
||||
attemptNumber: opts.attemptNumber,
|
||||
duration_ms: opts.result.duration,
|
||||
cost_usd: opts.result.cost || 0,
|
||||
success: false,
|
||||
model: opts.result.model,
|
||||
error: opts.errorMessage,
|
||||
};
|
||||
await auditSession.endAgent(agentName, endResult);
|
||||
|
||||
return err(new PentestError(opts.errorMessage, opts.category, opts.retryable, opts.context, opts.errorCode));
|
||||
}
|
||||
|
||||
/**
|
||||
* Execute an agent, throwing PentestError on failure.
|
||||
*
|
||||
* This is the preferred method for Temporal activities, which need to
|
||||
* catch errors and classify them into ApplicationFailure. Avoids requiring
|
||||
* activities to import Result utilities, keeping the boundary clean.
|
||||
*
|
||||
* @param agentName - Name of the agent to execute
|
||||
* @param input - Execution input parameters
|
||||
* @param auditSession - Audit session for this specific agent execution
|
||||
* @returns AgentEndResult on success
|
||||
* @throws PentestError on failure
|
||||
*/
|
||||
async executeOrThrow(
|
||||
agentName: AgentName,
|
||||
input: AgentExecutionInput,
|
||||
auditSession: AuditSession,
|
||||
logger: ActivityLogger,
|
||||
): Promise<AgentEndResult> {
|
||||
const result = await this.execute(agentName, input, auditSession, logger);
|
||||
if (isErr(result)) {
|
||||
throw result.error;
|
||||
}
|
||||
return result.value;
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert AgentEndResult to AgentMetrics for workflow state.
|
||||
*/
|
||||
static toMetrics(endResult: AgentEndResult, result: ClaudePromptResult): AgentMetrics {
|
||||
return {
|
||||
durationMs: endResult.duration_ms,
|
||||
inputTokens: null, // Not currently exposed by SDK wrapper
|
||||
outputTokens: null,
|
||||
costUsd: endResult.cost_usd,
|
||||
numTurns: result.turns ?? null,
|
||||
model: result.model,
|
||||
};
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,73 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Config Loader Service
|
||||
*
|
||||
* Wraps parseConfig + distributeConfig with Result type for explicit error handling.
|
||||
* Pure service with no Temporal dependencies.
|
||||
*/
|
||||
|
||||
import { distributeConfig, parseConfig } from '../config-parser.js';
|
||||
import type { DistributedConfig } from '../types/config.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { err, ok, type Result } from '../types/result.js';
|
||||
import { PentestError } from './error-handling.js';
|
||||
|
||||
/**
|
||||
* Service for loading and distributing configuration files.
|
||||
*
|
||||
* Provides a Result-based API for explicit error handling,
|
||||
* allowing callers to decide how to handle failures.
|
||||
*/
|
||||
export class ConfigLoaderService {
|
||||
/**
|
||||
* Load and distribute a configuration file.
|
||||
*
|
||||
* @param configPath - Path to the YAML configuration file
|
||||
* @returns Result containing DistributedConfig on success, PentestError on failure
|
||||
*/
|
||||
async load(configPath: string): Promise<Result<DistributedConfig, PentestError>> {
|
||||
try {
|
||||
const config = await parseConfig(configPath);
|
||||
const distributed = distributeConfig(config);
|
||||
return ok(distributed);
|
||||
} catch (error) {
|
||||
const errorMessage = error instanceof Error ? error.message : String(error);
|
||||
|
||||
// Determine appropriate error code based on error message
|
||||
let errorCode = ErrorCode.CONFIG_PARSE_ERROR;
|
||||
if (errorMessage.includes('not found') || errorMessage.includes('ENOENT')) {
|
||||
errorCode = ErrorCode.CONFIG_NOT_FOUND;
|
||||
} else if (errorMessage.includes('validation failed')) {
|
||||
errorCode = ErrorCode.CONFIG_VALIDATION_FAILED;
|
||||
}
|
||||
|
||||
return err(
|
||||
new PentestError(
|
||||
`Failed to load config ${configPath}: ${errorMessage}`,
|
||||
'config',
|
||||
false,
|
||||
{ configPath, originalError: errorMessage },
|
||||
errorCode,
|
||||
),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Load config if path is provided, otherwise return null config.
|
||||
*
|
||||
* @param configPath - Optional path to the YAML configuration file
|
||||
* @returns Result containing DistributedConfig (or null) on success, PentestError on failure
|
||||
*/
|
||||
async loadOptional(configPath: string | undefined): Promise<Result<DistributedConfig | null, PentestError>> {
|
||||
if (!configPath) {
|
||||
return ok(null);
|
||||
}
|
||||
return this.load(configPath);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,114 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Dependency Injection Container
|
||||
*
|
||||
* Provides a per-workflow container for service instances.
|
||||
* Services are wired with explicit constructor injection.
|
||||
*
|
||||
* Usage:
|
||||
* const container = getOrCreateContainer(workflowId, sessionMetadata);
|
||||
* const auditSession = new AuditSession(sessionMetadata); // Per-agent
|
||||
* await auditSession.initialize(workflowId);
|
||||
* const result = await container.agentExecution.executeOrThrow(agentName, input, auditSession);
|
||||
*/
|
||||
|
||||
import type { SessionMetadata } from '../audit/utils.js';
|
||||
import { AgentExecutionService } from './agent-execution.js';
|
||||
import { ConfigLoaderService } from './config-loader.js';
|
||||
import { ExploitationCheckerService } from './exploitation-checker.js';
|
||||
|
||||
/**
|
||||
* Dependencies required to create a Container.
|
||||
*
|
||||
* NOTE: AuditSession is NOT stored in the container.
|
||||
* Each agent execution receives its own AuditSession instance
|
||||
* because AuditSession uses instance state (currentAgentName) that
|
||||
* cannot be shared across parallel agents.
|
||||
*/
|
||||
export interface ContainerDependencies {
|
||||
readonly sessionMetadata: SessionMetadata;
|
||||
}
|
||||
|
||||
/**
|
||||
* DI Container for a single workflow.
|
||||
*
|
||||
* Holds all service instances for the workflow lifecycle.
|
||||
* Services are instantiated once and reused across agent executions.
|
||||
*
|
||||
* NOTE: AuditSession is NOT stored here - it's passed per agent execution
|
||||
* to support parallel agents each having their own logging context.
|
||||
*/
|
||||
export class Container {
|
||||
readonly sessionMetadata: SessionMetadata;
|
||||
readonly agentExecution: AgentExecutionService;
|
||||
readonly configLoader: ConfigLoaderService;
|
||||
readonly exploitationChecker: ExploitationCheckerService;
|
||||
|
||||
constructor(deps: ContainerDependencies) {
|
||||
this.sessionMetadata = deps.sessionMetadata;
|
||||
|
||||
// Wire services with explicit constructor injection
|
||||
this.configLoader = new ConfigLoaderService();
|
||||
this.exploitationChecker = new ExploitationCheckerService();
|
||||
this.agentExecution = new AgentExecutionService(this.configLoader);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Map of workflowId to Container instance.
|
||||
* Each workflow gets its own container scoped to its lifecycle.
|
||||
*/
|
||||
const containers = new Map<string, Container>();
|
||||
|
||||
/**
|
||||
* Get or create a Container for a workflow.
|
||||
*
|
||||
* If a container already exists for the workflowId, returns it.
|
||||
* Otherwise, creates a new container with the provided dependencies.
|
||||
*
|
||||
* @param workflowId - Unique workflow identifier
|
||||
* @param sessionMetadata - Session metadata for audit paths
|
||||
* @returns Container instance for the workflow
|
||||
*/
|
||||
export function getOrCreateContainer(workflowId: string, sessionMetadata: SessionMetadata): Container {
|
||||
let container = containers.get(workflowId);
|
||||
|
||||
if (!container) {
|
||||
container = new Container({ sessionMetadata });
|
||||
containers.set(workflowId, container);
|
||||
}
|
||||
|
||||
return container;
|
||||
}
|
||||
|
||||
/**
|
||||
* Remove a Container when a workflow completes.
|
||||
*
|
||||
* Should be called in logWorkflowComplete to clean up resources.
|
||||
*
|
||||
* @param workflowId - Unique workflow identifier
|
||||
*/
|
||||
export function removeContainer(workflowId: string): void {
|
||||
containers.delete(workflowId);
|
||||
}
|
||||
|
||||
/**
|
||||
* Get an existing Container for a workflow, if one exists.
|
||||
*
|
||||
* Unlike getOrCreateContainer, this does NOT create a new container.
|
||||
* Returns undefined if no container exists for the workflowId.
|
||||
*
|
||||
* Useful for lightweight activities that can benefit from an existing
|
||||
* container but don't need to create one.
|
||||
*
|
||||
* @param workflowId - Unique workflow identifier
|
||||
* @returns Container instance or undefined
|
||||
*/
|
||||
export function getContainer(workflowId: string): Container | undefined {
|
||||
return containers.get(workflowId);
|
||||
}
|
||||
@@ -0,0 +1,244 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { ErrorCode, type PentestErrorContext, type PentestErrorType, type PromptErrorResult } from '../types/errors.js';
|
||||
import { matchesBillingApiPattern, matchesBillingTextPattern } from '../utils/billing-detection.js';
|
||||
|
||||
export class PentestError extends Error {
|
||||
override name = 'PentestError' as const;
|
||||
type: PentestErrorType;
|
||||
retryable: boolean;
|
||||
context: PentestErrorContext;
|
||||
timestamp: string;
|
||||
/** Optional specific error code for reliable classification */
|
||||
code?: ErrorCode;
|
||||
|
||||
constructor(
|
||||
message: string,
|
||||
type: PentestErrorType,
|
||||
retryable: boolean = false,
|
||||
context: PentestErrorContext = {},
|
||||
code?: ErrorCode,
|
||||
) {
|
||||
super(message);
|
||||
this.type = type;
|
||||
this.retryable = retryable;
|
||||
this.context = context;
|
||||
this.timestamp = new Date().toISOString();
|
||||
if (code !== undefined) {
|
||||
this.code = code;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
export function handlePromptError(promptName: string, error: Error): PromptErrorResult {
|
||||
return {
|
||||
success: false,
|
||||
error: new PentestError(`Failed to load prompt '${promptName}': ${error.message}`, 'prompt', false, {
|
||||
promptName,
|
||||
originalError: error.message,
|
||||
}),
|
||||
};
|
||||
}
|
||||
|
||||
const RETRYABLE_PATTERNS = [
|
||||
// Network and connection errors
|
||||
'network',
|
||||
'connection',
|
||||
'timeout',
|
||||
'econnreset',
|
||||
'enotfound',
|
||||
'econnrefused',
|
||||
// Rate limiting
|
||||
'rate limit',
|
||||
'429',
|
||||
'too many requests',
|
||||
// Server errors
|
||||
'server error',
|
||||
'5xx',
|
||||
'internal server error',
|
||||
'service unavailable',
|
||||
'bad gateway',
|
||||
// Claude API errors
|
||||
'model unavailable',
|
||||
'service temporarily unavailable',
|
||||
'api error',
|
||||
'terminated',
|
||||
// Max turns
|
||||
'max turns',
|
||||
'maximum turns',
|
||||
];
|
||||
|
||||
// Patterns that indicate non-retryable errors (checked before default)
|
||||
const NON_RETRYABLE_PATTERNS = [
|
||||
'authentication',
|
||||
'invalid prompt',
|
||||
'out of memory',
|
||||
'permission denied',
|
||||
'session limit reached',
|
||||
'invalid api key',
|
||||
];
|
||||
|
||||
// Conservative retry classification - unknown errors don't retry (fail-safe default)
|
||||
export function isRetryableError(error: Error): boolean {
|
||||
const message = error.message.toLowerCase();
|
||||
|
||||
if (NON_RETRYABLE_PATTERNS.some((pattern) => message.includes(pattern))) {
|
||||
return false;
|
||||
}
|
||||
|
||||
return RETRYABLE_PATTERNS.some((pattern) => message.includes(pattern));
|
||||
}
|
||||
|
||||
/**
|
||||
* Classifies errors by ErrorCode for reliable, code-based classification.
|
||||
* Used when error is a PentestError with a specific ErrorCode.
|
||||
*/
|
||||
function classifyByErrorCode(code: ErrorCode, retryableFromError: boolean): { type: string; retryable: boolean } {
|
||||
switch (code) {
|
||||
// Billing errors - retryable (wait for cap reset or credits added)
|
||||
case ErrorCode.SPENDING_CAP_REACHED:
|
||||
case ErrorCode.INSUFFICIENT_CREDITS:
|
||||
return { type: 'BillingError', retryable: true };
|
||||
|
||||
case ErrorCode.API_RATE_LIMITED:
|
||||
return { type: 'RateLimitError', retryable: true };
|
||||
|
||||
// Config errors - non-retryable (need manual fix)
|
||||
case ErrorCode.CONFIG_NOT_FOUND:
|
||||
case ErrorCode.CONFIG_VALIDATION_FAILED:
|
||||
case ErrorCode.CONFIG_PARSE_ERROR:
|
||||
return { type: 'ConfigurationError', retryable: false };
|
||||
|
||||
// Prompt errors - non-retryable (need manual fix)
|
||||
case ErrorCode.PROMPT_LOAD_FAILED:
|
||||
return { type: 'ConfigurationError', retryable: false };
|
||||
|
||||
// Git errors - non-retryable (indicates workspace corruption)
|
||||
case ErrorCode.GIT_CHECKPOINT_FAILED:
|
||||
case ErrorCode.GIT_ROLLBACK_FAILED:
|
||||
return { type: 'GitError', retryable: false };
|
||||
|
||||
// Validation errors - retryable (agent may succeed on retry)
|
||||
case ErrorCode.OUTPUT_VALIDATION_FAILED:
|
||||
case ErrorCode.DELIVERABLE_NOT_FOUND:
|
||||
return { type: 'OutputValidationError', retryable: true };
|
||||
|
||||
// Agent execution - use the retryable flag from the error
|
||||
case ErrorCode.AGENT_EXECUTION_FAILED:
|
||||
return { type: 'AgentExecutionError', retryable: retryableFromError };
|
||||
|
||||
// Preflight validation errors
|
||||
case ErrorCode.REPO_NOT_FOUND:
|
||||
return { type: 'ConfigurationError', retryable: false };
|
||||
|
||||
case ErrorCode.AUTH_FAILED:
|
||||
return { type: 'AuthenticationError', retryable: false };
|
||||
|
||||
case ErrorCode.BILLING_ERROR:
|
||||
return { type: 'BillingError', retryable: true };
|
||||
|
||||
default:
|
||||
// Unknown code - fall through to string matching
|
||||
return { type: 'UnknownError', retryable: retryableFromError };
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Classifies errors for Temporal workflow retry behavior.
|
||||
* Returns error type and whether Temporal should retry.
|
||||
*
|
||||
* Used by activities to wrap errors in ApplicationFailure:
|
||||
* - Retryable errors: Temporal retries with configured backoff
|
||||
* - Non-retryable errors: Temporal fails immediately
|
||||
*
|
||||
* Classification priority:
|
||||
* 1. If error is PentestError with ErrorCode, classify by code (reliable)
|
||||
* 2. Fall through to string matching for external errors (SDK, network, etc.)
|
||||
*/
|
||||
export function classifyErrorForTemporal(error: unknown): { type: string; retryable: boolean } {
|
||||
// === CODE-BASED CLASSIFICATION (Preferred for internal errors) ===
|
||||
if (error instanceof PentestError && error.code !== undefined) {
|
||||
return classifyByErrorCode(error.code, error.retryable);
|
||||
}
|
||||
|
||||
// === STRING-BASED CLASSIFICATION (Fallback for external errors) ===
|
||||
const message = (error instanceof Error ? error.message : String(error)).toLowerCase();
|
||||
|
||||
// === BILLING ERRORS (Retryable with long backoff) ===
|
||||
// Anthropic returns billing as 400 invalid_request_error
|
||||
// Human can add credits OR wait for spending cap to reset (5-30 min backoff)
|
||||
// Check both API patterns and text patterns for comprehensive detection
|
||||
if (matchesBillingApiPattern(message) || matchesBillingTextPattern(message)) {
|
||||
return { type: 'BillingError', retryable: true };
|
||||
}
|
||||
|
||||
// === PERMANENT ERRORS (Non-retryable) ===
|
||||
|
||||
// Authentication (401) - bad API key won't fix itself
|
||||
if (
|
||||
message.includes('authentication') ||
|
||||
message.includes('api key') ||
|
||||
message.includes('401') ||
|
||||
message.includes('authentication_error')
|
||||
) {
|
||||
return { type: 'AuthenticationError', retryable: false };
|
||||
}
|
||||
|
||||
// Permission (403) - access won't be granted
|
||||
if (message.includes('permission') || message.includes('forbidden') || message.includes('403')) {
|
||||
return { type: 'PermissionError', retryable: false };
|
||||
}
|
||||
|
||||
// === OUTPUT VALIDATION ERRORS (Retryable) ===
|
||||
// Agent didn't produce expected deliverables - retry may succeed
|
||||
// IMPORTANT: Must come BEFORE generic 'validation' check below
|
||||
if (message.includes('failed output validation') || message.includes('output validation failed')) {
|
||||
return { type: 'OutputValidationError', retryable: true };
|
||||
}
|
||||
|
||||
// Invalid Request (400) - malformed request is permanent
|
||||
// Note: Checked AFTER billing and AFTER output validation
|
||||
if (message.includes('invalid_request_error') || message.includes('malformed') || message.includes('validation')) {
|
||||
return { type: 'InvalidRequestError', retryable: false };
|
||||
}
|
||||
|
||||
// Request Too Large (413) - won't fit no matter how many retries
|
||||
if (message.includes('request_too_large') || message.includes('too large') || message.includes('413')) {
|
||||
return { type: 'RequestTooLargeError', retryable: false };
|
||||
}
|
||||
|
||||
// Configuration errors - missing files need manual fix
|
||||
if (message.includes('enoent') || message.includes('no such file') || message.includes('cli not installed')) {
|
||||
return { type: 'ConfigurationError', retryable: false };
|
||||
}
|
||||
|
||||
// Execution limits - max turns/budget reached
|
||||
if (
|
||||
message.includes('max turns') ||
|
||||
message.includes('budget') ||
|
||||
message.includes('execution limit') ||
|
||||
message.includes('error_max_turns') ||
|
||||
message.includes('error_max_budget')
|
||||
) {
|
||||
return { type: 'ExecutionLimitError', retryable: false };
|
||||
}
|
||||
|
||||
// Invalid target URL - bad URL format won't fix itself
|
||||
if (
|
||||
message.includes('invalid url') ||
|
||||
message.includes('invalid target') ||
|
||||
message.includes('malformed url') ||
|
||||
message.includes('invalid uri')
|
||||
) {
|
||||
return { type: 'InvalidTargetError', retryable: false };
|
||||
}
|
||||
|
||||
// === TRANSIENT ERRORS (Retryable) ===
|
||||
// Rate limits (429), server errors (5xx), network issues
|
||||
// Let Temporal retry with configured backoff
|
||||
return { type: 'TransientError', retryable: true };
|
||||
}
|
||||
@@ -0,0 +1,67 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Exploitation Checker Service
|
||||
*
|
||||
* Pure domain logic for determining whether exploitation should run.
|
||||
* Reads queue file, parses JSON, returns decision.
|
||||
*
|
||||
* No Temporal dependencies - this is pure business logic.
|
||||
*/
|
||||
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { isOk } from '../types/result.js';
|
||||
import { type ExploitationDecision, type VulnType, validateQueueSafe } from './queue-validation.js';
|
||||
|
||||
/**
|
||||
* Service for checking exploitation queue decisions.
|
||||
*
|
||||
* Determines whether an exploit agent should run based on
|
||||
* the vulnerability analysis deliverables and queue files.
|
||||
*/
|
||||
export class ExploitationCheckerService {
|
||||
/**
|
||||
* Check if exploitation should run for a given vulnerability type.
|
||||
*
|
||||
* Reads the vulnerability queue file and returns the decision.
|
||||
* This is pure domain logic - reads queue file, parses JSON, returns decision.
|
||||
*
|
||||
* @param vulnType - Type of vulnerability (injection, xss, auth, ssrf, authz)
|
||||
* @param repoPath - Path to the repository containing deliverables
|
||||
* @param logger - ActivityLogger for structured logging
|
||||
* @returns ExploitationDecision indicating whether to exploit
|
||||
* @throws PentestError if validation fails and is retryable
|
||||
*/
|
||||
async checkQueue(vulnType: VulnType, repoPath: string, logger: ActivityLogger): Promise<ExploitationDecision> {
|
||||
const result = await validateQueueSafe(vulnType, repoPath);
|
||||
|
||||
if (isOk(result)) {
|
||||
const decision = result.value;
|
||||
logger.info(
|
||||
`${vulnType}: ${decision.shouldExploit ? `${decision.vulnerabilityCount} vulnerabilities found` : 'no vulnerabilities, skipping exploitation'}`,
|
||||
);
|
||||
return decision;
|
||||
}
|
||||
|
||||
// Validation failed - check if we should retry or skip
|
||||
const error = result.error;
|
||||
if (error.retryable) {
|
||||
// Re-throw retryable errors so caller can handle retry
|
||||
logger.warn(`${vulnType}: ${error.message} (retryable)`);
|
||||
throw error;
|
||||
}
|
||||
|
||||
// Non-retryable error - skip exploitation gracefully
|
||||
logger.warn(`${vulnType}: ${error.message}, skipping exploitation`);
|
||||
return {
|
||||
shouldExploit: false,
|
||||
shouldRetry: false,
|
||||
vulnerabilityCount: 0,
|
||||
vulnType,
|
||||
};
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,304 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { $ } from 'zx';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { PentestError } from './error-handling.js';
|
||||
|
||||
/**
|
||||
* Check if a directory is a git repository.
|
||||
* Returns true if the directory contains a .git folder or is inside a git repo.
|
||||
*/
|
||||
export async function isGitRepository(dir: string): Promise<boolean> {
|
||||
try {
|
||||
await $`cd ${dir} && git rev-parse --git-dir`.quiet();
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
interface GitOperationResult {
|
||||
success: boolean;
|
||||
hadChanges?: boolean;
|
||||
error?: Error;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get list of changed files from git status --porcelain output
|
||||
*/
|
||||
async function getChangedFiles(sourceDir: string, operationDescription: string): Promise<string[]> {
|
||||
const status = await executeGitCommandWithRetry(['git', 'status', '--porcelain'], sourceDir, operationDescription);
|
||||
return status.stdout
|
||||
.trim()
|
||||
.split('\n')
|
||||
.filter((line) => line.length > 0);
|
||||
}
|
||||
|
||||
/**
|
||||
* Log a summary of changed files with truncation for long lists
|
||||
*/
|
||||
function logChangeSummary(
|
||||
changes: string[],
|
||||
messageWithChanges: string,
|
||||
messageWithoutChanges: string,
|
||||
logger: ActivityLogger,
|
||||
level: 'info' | 'warn' = 'info',
|
||||
maxToShow: number = 5,
|
||||
): void {
|
||||
if (changes.length > 0) {
|
||||
const msg = messageWithChanges.replace('{count}', String(changes.length));
|
||||
const fileList = changes
|
||||
.slice(0, maxToShow)
|
||||
.map((c) => ` ${c}`)
|
||||
.join(', ');
|
||||
const suffix = changes.length > maxToShow ? ` ... and ${changes.length - maxToShow} more files` : '';
|
||||
logger[level](`${msg} ${fileList}${suffix}`);
|
||||
} else {
|
||||
logger[level](messageWithoutChanges);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Convert unknown error to GitOperationResult
|
||||
*/
|
||||
function toErrorResult(error: unknown): GitOperationResult {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
return {
|
||||
success: false,
|
||||
error: error instanceof Error ? error : new Error(errMsg),
|
||||
};
|
||||
}
|
||||
|
||||
// Serializes git operations to prevent index.lock conflicts during parallel agent execution
|
||||
class GitSemaphore {
|
||||
private queue: Array<() => void> = [];
|
||||
private running: boolean = false;
|
||||
|
||||
async acquire(): Promise<void> {
|
||||
return new Promise((resolve) => {
|
||||
this.queue.push(resolve);
|
||||
this.process();
|
||||
});
|
||||
}
|
||||
|
||||
release(): void {
|
||||
this.running = false;
|
||||
this.process();
|
||||
}
|
||||
|
||||
private process(): void {
|
||||
if (!this.running && this.queue.length > 0) {
|
||||
this.running = true;
|
||||
const resolve = this.queue.shift();
|
||||
resolve?.();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const gitSemaphore = new GitSemaphore();
|
||||
|
||||
const GIT_LOCK_ERROR_PATTERNS = [
|
||||
'index.lock',
|
||||
'unable to lock',
|
||||
'Another git process',
|
||||
'fatal: Unable to create',
|
||||
'fatal: index file',
|
||||
];
|
||||
|
||||
function isGitLockError(errorMessage: string): boolean {
|
||||
return GIT_LOCK_ERROR_PATTERNS.some((pattern) => errorMessage.includes(pattern));
|
||||
}
|
||||
|
||||
// Retries git commands on lock conflicts with exponential backoff
|
||||
export async function executeGitCommandWithRetry(
|
||||
commandArgs: string[],
|
||||
sourceDir: string,
|
||||
description: string,
|
||||
maxRetries: number = 5,
|
||||
): Promise<{ stdout: string; stderr: string }> {
|
||||
await gitSemaphore.acquire();
|
||||
|
||||
try {
|
||||
for (let attempt = 1; attempt <= maxRetries; attempt++) {
|
||||
try {
|
||||
const [cmd, ...args] = commandArgs;
|
||||
const result = await $`cd ${sourceDir} && ${cmd} ${args}`;
|
||||
return result;
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
|
||||
if (isGitLockError(errMsg) && attempt < maxRetries) {
|
||||
const delay = 2 ** (attempt - 1) * 1000;
|
||||
// executeGitCommandWithRetry is also called outside activity context
|
||||
// (e.g., from resume logic), so we use console.warn as a fallback here
|
||||
console.warn(
|
||||
`Git lock conflict during ${description} (attempt ${attempt}/${maxRetries}). Retrying in ${delay}ms...`,
|
||||
);
|
||||
await new Promise((resolve) => setTimeout(resolve, delay));
|
||||
continue;
|
||||
}
|
||||
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
throw new PentestError(
|
||||
`Git command failed after ${maxRetries} retries`,
|
||||
'filesystem',
|
||||
true, // Retryable - transient git lock issues
|
||||
{ maxRetries, description },
|
||||
ErrorCode.GIT_CHECKPOINT_FAILED,
|
||||
);
|
||||
} finally {
|
||||
gitSemaphore.release();
|
||||
}
|
||||
}
|
||||
|
||||
// Two-phase reset: hard reset (tracked files) + clean (untracked files)
|
||||
export async function rollbackGitWorkspace(
|
||||
sourceDir: string,
|
||||
reason: string = 'retry preparation',
|
||||
logger: ActivityLogger,
|
||||
): Promise<GitOperationResult> {
|
||||
// Skip git operations if not a git repository
|
||||
if (!(await isGitRepository(sourceDir))) {
|
||||
logger.info('Skipping git rollback (not a git repository)');
|
||||
return { success: true };
|
||||
}
|
||||
|
||||
logger.info(`Rolling back workspace for ${reason}`);
|
||||
try {
|
||||
const changes = await getChangedFiles(sourceDir, 'status check for rollback');
|
||||
|
||||
await executeGitCommandWithRetry(['git', 'reset', '--hard', 'HEAD'], sourceDir, 'hard reset for rollback');
|
||||
await executeGitCommandWithRetry(['git', 'clean', '-fd'], sourceDir, 'cleaning untracked files for rollback');
|
||||
|
||||
logChangeSummary(
|
||||
changes,
|
||||
'Rollback completed - removed {count} contaminated changes:',
|
||||
'Rollback completed - no changes to remove',
|
||||
logger,
|
||||
'info',
|
||||
3,
|
||||
);
|
||||
return { success: true };
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
logger.error(`Rollback failed after retries: ${errMsg}`);
|
||||
return {
|
||||
success: false,
|
||||
error: new PentestError(
|
||||
`Git rollback failed: ${errMsg}`,
|
||||
'filesystem',
|
||||
false, // Non-retryable - rollback is best-effort cleanup
|
||||
{ sourceDir, reason },
|
||||
ErrorCode.GIT_ROLLBACK_FAILED,
|
||||
),
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// Creates checkpoint before each attempt. First attempt preserves workspace; retries clean it.
|
||||
export async function createGitCheckpoint(
|
||||
sourceDir: string,
|
||||
description: string,
|
||||
attempt: number,
|
||||
logger: ActivityLogger,
|
||||
): Promise<GitOperationResult> {
|
||||
// Skip git operations if not a git repository
|
||||
if (!(await isGitRepository(sourceDir))) {
|
||||
logger.info('Skipping git checkpoint (not a git repository)');
|
||||
return { success: true };
|
||||
}
|
||||
|
||||
logger.info(`Creating checkpoint for ${description} (attempt ${attempt})`);
|
||||
try {
|
||||
// 1. On retries, clean workspace to prevent pollution from previous attempt
|
||||
if (attempt > 1) {
|
||||
const cleanResult = await rollbackGitWorkspace(sourceDir, `${description} (retry cleanup)`, logger);
|
||||
if (!cleanResult.success) {
|
||||
logger.warn(`Workspace cleanup failed, continuing anyway: ${cleanResult.error?.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Detect existing changes
|
||||
const changes = await getChangedFiles(sourceDir, 'status check');
|
||||
const hasChanges = changes.length > 0;
|
||||
|
||||
// 3. Stage and commit checkpoint
|
||||
await executeGitCommandWithRetry(['git', 'add', '-A'], sourceDir, 'staging changes');
|
||||
await executeGitCommandWithRetry(
|
||||
['git', 'commit', '-m', `📍 Checkpoint: ${description} (attempt ${attempt})`, '--allow-empty'],
|
||||
sourceDir,
|
||||
'creating commit',
|
||||
);
|
||||
|
||||
// 4. Log result
|
||||
if (hasChanges) {
|
||||
logger.info('Checkpoint created with uncommitted changes staged');
|
||||
} else {
|
||||
logger.info('Empty checkpoint created (no workspace changes)');
|
||||
}
|
||||
return { success: true };
|
||||
} catch (error) {
|
||||
const result = toErrorResult(error);
|
||||
logger.warn(`Checkpoint creation failed after retries: ${result.error?.message}`);
|
||||
return result;
|
||||
}
|
||||
}
|
||||
|
||||
export async function commitGitSuccess(
|
||||
sourceDir: string,
|
||||
description: string,
|
||||
logger: ActivityLogger,
|
||||
): Promise<GitOperationResult> {
|
||||
// Skip git operations if not a git repository
|
||||
if (!(await isGitRepository(sourceDir))) {
|
||||
logger.info('Skipping git commit (not a git repository)');
|
||||
return { success: true };
|
||||
}
|
||||
|
||||
logger.info(`Committing successful results for ${description}`);
|
||||
try {
|
||||
const changes = await getChangedFiles(sourceDir, 'status check for success commit');
|
||||
|
||||
await executeGitCommandWithRetry(['git', 'add', '-A'], sourceDir, 'staging changes for success commit');
|
||||
await executeGitCommandWithRetry(
|
||||
['git', 'commit', '-m', `✅ ${description}: completed successfully`, '--allow-empty'],
|
||||
sourceDir,
|
||||
'creating success commit',
|
||||
);
|
||||
|
||||
logChangeSummary(
|
||||
changes,
|
||||
'Success commit created with {count} file changes:',
|
||||
'Empty success commit created (agent made no file changes)',
|
||||
logger,
|
||||
);
|
||||
return { success: true };
|
||||
} catch (error) {
|
||||
const result = toErrorResult(error);
|
||||
logger.warn(`Success commit failed after retries: ${result.error?.message}`);
|
||||
return result;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get current git commit hash.
|
||||
* Returns null if not a git repository.
|
||||
*/
|
||||
export async function getGitCommitHash(sourceDir: string): Promise<string | null> {
|
||||
if (!(await isGitRepository(sourceDir))) {
|
||||
return null;
|
||||
}
|
||||
try {
|
||||
const result = await $`cd ${sourceDir} && git rev-parse HEAD`;
|
||||
return result.stdout.trim();
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,22 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Services Module
|
||||
*
|
||||
* Exports DI container and service classes for Shannon agent execution.
|
||||
* Services are pure domain logic with no Temporal dependencies.
|
||||
*/
|
||||
|
||||
export type { AgentExecutionInput } from './agent-execution.js';
|
||||
export { AgentExecutionService } from './agent-execution.js';
|
||||
|
||||
export { ConfigLoaderService } from './config-loader.js';
|
||||
export type { ContainerDependencies } from './container.js';
|
||||
export { Container, getOrCreateContainer, removeContainer } from './container.js';
|
||||
export { ExploitationCheckerService } from './exploitation-checker.js';
|
||||
export { loadPrompt } from './prompt-manager.js';
|
||||
export { assembleFinalReport, injectModelIntoReport } from './reporting.js';
|
||||
@@ -0,0 +1,489 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Preflight Validation Service
|
||||
*
|
||||
* Runs cheap, fast checks before any agent execution begins.
|
||||
* Catches configuration and credential problems early, saving
|
||||
* time and API costs compared to failing mid-pipeline.
|
||||
*
|
||||
* Checks run sequentially, cheapest first:
|
||||
* 1. Repository path exists and contains .git
|
||||
* 2. Config file parses and validates (if provided)
|
||||
* 3. Credentials validate via Claude Agent SDK query (API key, OAuth, Bedrock, Vertex AI, or router mode)
|
||||
* 4. Target URL is reachable from the container (DNS + HTTP)
|
||||
*/
|
||||
|
||||
import { lookup } from 'node:dns/promises';
|
||||
import fs from 'node:fs/promises';
|
||||
import http from 'node:http';
|
||||
import https from 'node:https';
|
||||
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { query } from '@anthropic-ai/claude-agent-sdk';
|
||||
import { resolveModel } from '../ai/models.js';
|
||||
import { parseConfig } from '../config-parser.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { err, ok, type Result } from '../types/result.js';
|
||||
import { isRetryableError, PentestError } from './error-handling.js';
|
||||
|
||||
const TARGET_URL_TIMEOUT_MS = 10_000;
|
||||
|
||||
function isLoopbackAddress(address: string): boolean {
|
||||
return address === '127.0.0.1' || address === '::1' || address === '0.0.0.0';
|
||||
}
|
||||
|
||||
// === Repository Validation ===
|
||||
|
||||
async function validateRepo(repoPath: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
|
||||
logger.info('Checking repository path...', { repoPath });
|
||||
|
||||
// 1. Check repo directory exists
|
||||
try {
|
||||
const stats = await fs.stat(repoPath);
|
||||
if (!stats.isDirectory()) {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Repository path is not a directory: ${repoPath}`,
|
||||
'config',
|
||||
false,
|
||||
{ repoPath },
|
||||
ErrorCode.REPO_NOT_FOUND,
|
||||
),
|
||||
);
|
||||
}
|
||||
} catch {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Repository path does not exist: ${repoPath}`,
|
||||
'config',
|
||||
false,
|
||||
{ repoPath },
|
||||
ErrorCode.REPO_NOT_FOUND,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Check .git directory exists
|
||||
try {
|
||||
const gitStats = await fs.stat(`${repoPath}/.git`);
|
||||
if (!gitStats.isDirectory()) {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Not a git repository (no .git directory): ${repoPath}`,
|
||||
'config',
|
||||
false,
|
||||
{ repoPath },
|
||||
ErrorCode.REPO_NOT_FOUND,
|
||||
),
|
||||
);
|
||||
}
|
||||
} catch {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Not a git repository (no .git directory): ${repoPath}`,
|
||||
'config',
|
||||
false,
|
||||
{ repoPath },
|
||||
ErrorCode.REPO_NOT_FOUND,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
logger.info('Repository path OK');
|
||||
return ok(undefined);
|
||||
}
|
||||
|
||||
// === Config Validation ===
|
||||
|
||||
async function validateConfig(configPath: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
|
||||
logger.info('Validating configuration file...', { configPath });
|
||||
|
||||
try {
|
||||
await parseConfig(configPath);
|
||||
logger.info('Configuration file OK');
|
||||
return ok(undefined);
|
||||
} catch (error) {
|
||||
if (error instanceof PentestError) {
|
||||
return err(error);
|
||||
}
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return err(
|
||||
new PentestError(
|
||||
`Configuration validation failed: ${message}`,
|
||||
'config',
|
||||
false,
|
||||
{ configPath },
|
||||
ErrorCode.CONFIG_VALIDATION_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// === Credential Validation ===
|
||||
|
||||
/** Map SDK error type to a human-readable preflight PentestError. */
|
||||
function classifySdkError(sdkError: SDKAssistantMessageError, authType: string): Result<void, PentestError> {
|
||||
switch (sdkError) {
|
||||
case 'authentication_failed':
|
||||
return err(
|
||||
new PentestError(
|
||||
`Invalid ${authType}. Check your credentials in .env and try again.`,
|
||||
'config',
|
||||
false,
|
||||
{ authType, sdkError },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
case 'billing_error':
|
||||
return err(
|
||||
new PentestError(
|
||||
`Anthropic account has a billing issue. Add credits or check your billing dashboard.`,
|
||||
'billing',
|
||||
true,
|
||||
{ authType, sdkError },
|
||||
ErrorCode.BILLING_ERROR,
|
||||
),
|
||||
);
|
||||
case 'rate_limit':
|
||||
return err(
|
||||
new PentestError(
|
||||
`Anthropic rate limit or spending cap reached. Wait a few minutes and try again.`,
|
||||
'billing',
|
||||
true,
|
||||
{ authType, sdkError },
|
||||
ErrorCode.BILLING_ERROR,
|
||||
),
|
||||
);
|
||||
case 'server_error':
|
||||
return err(
|
||||
new PentestError(`Anthropic API is temporarily unavailable. Try again shortly.`, 'network', true, {
|
||||
authType,
|
||||
sdkError,
|
||||
}),
|
||||
);
|
||||
default:
|
||||
return err(
|
||||
new PentestError(
|
||||
`${authType} validation failed unexpectedly. Check your credentials in .env.`,
|
||||
'config',
|
||||
false,
|
||||
{ authType, sdkError },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
/** Validate credentials via a minimal Claude Agent SDK query. */
|
||||
async function validateCredentials(logger: ActivityLogger): Promise<Result<void, PentestError>> {
|
||||
// 1. Custom base URL — validate endpoint is reachable via SDK query
|
||||
if (process.env.ANTHROPIC_BASE_URL) {
|
||||
const baseUrl = process.env.ANTHROPIC_BASE_URL;
|
||||
logger.info(`Validating custom base URL: ${baseUrl}`);
|
||||
|
||||
try {
|
||||
for await (const message of query({ prompt: 'hi', options: { model: resolveModel('small'), maxTurns: 1 } })) {
|
||||
if (message.type === 'assistant' && message.error) {
|
||||
return classifySdkError(message.error, `custom endpoint (${baseUrl})`);
|
||||
}
|
||||
if (message.type === 'result') {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
logger.info('Custom base URL OK');
|
||||
return ok(undefined);
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
return err(
|
||||
new PentestError(
|
||||
`Custom base URL unreachable: ${baseUrl} — ${message}`,
|
||||
'network',
|
||||
false,
|
||||
{ baseUrl },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// 2. Bedrock mode — validate required AWS credentials are present
|
||||
if (process.env.CLAUDE_CODE_USE_BEDROCK === '1') {
|
||||
const required = [
|
||||
'AWS_REGION',
|
||||
'AWS_BEARER_TOKEN_BEDROCK',
|
||||
'ANTHROPIC_SMALL_MODEL',
|
||||
'ANTHROPIC_MEDIUM_MODEL',
|
||||
'ANTHROPIC_LARGE_MODEL',
|
||||
];
|
||||
const missing = required.filter((v) => !process.env[v]);
|
||||
if (missing.length > 0) {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Bedrock mode requires the following env vars in .env: ${missing.join(', ')}`,
|
||||
'config',
|
||||
false,
|
||||
{ missing },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
logger.info('Bedrock credentials OK');
|
||||
return ok(undefined);
|
||||
}
|
||||
|
||||
// 3. Vertex AI mode — validate required GCP credentials are present
|
||||
if (process.env.CLAUDE_CODE_USE_VERTEX === '1') {
|
||||
const required = [
|
||||
'CLOUD_ML_REGION',
|
||||
'ANTHROPIC_VERTEX_PROJECT_ID',
|
||||
'ANTHROPIC_SMALL_MODEL',
|
||||
'ANTHROPIC_MEDIUM_MODEL',
|
||||
'ANTHROPIC_LARGE_MODEL',
|
||||
];
|
||||
const missing = required.filter((v) => !process.env[v]);
|
||||
if (missing.length > 0) {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Vertex AI mode requires the following env vars in .env: ${missing.join(', ')}`,
|
||||
'config',
|
||||
false,
|
||||
{ missing },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
// Validate service account credentials file is accessible
|
||||
const credPath = process.env.GOOGLE_APPLICATION_CREDENTIALS;
|
||||
if (!credPath) {
|
||||
return err(
|
||||
new PentestError(
|
||||
'Vertex AI mode requires GOOGLE_APPLICATION_CREDENTIALS pointing to a service account key JSON file',
|
||||
'config',
|
||||
false,
|
||||
{},
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
try {
|
||||
await fs.access(credPath);
|
||||
} catch {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Service account key file not found at: ${credPath}`,
|
||||
'config',
|
||||
false,
|
||||
{ credPath },
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
logger.info('Vertex AI credentials OK');
|
||||
return ok(undefined);
|
||||
}
|
||||
|
||||
// 4. Check that at least one credential is present
|
||||
if (!process.env.ANTHROPIC_API_KEY && !process.env.CLAUDE_CODE_OAUTH_TOKEN) {
|
||||
return err(
|
||||
new PentestError(
|
||||
'No API credentials found. Set ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN in .env (or use CLAUDE_CODE_USE_BEDROCK=1 for AWS Bedrock, or CLAUDE_CODE_USE_VERTEX=1 for Google Vertex AI)',
|
||||
'config',
|
||||
false,
|
||||
{},
|
||||
ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// 5. Validate via SDK query
|
||||
const authType = process.env.CLAUDE_CODE_OAUTH_TOKEN ? 'OAuth token' : 'API key';
|
||||
logger.info(`Validating ${authType} via SDK...`);
|
||||
|
||||
try {
|
||||
for await (const message of query({ prompt: 'hi', options: { model: resolveModel('small'), maxTurns: 1 } })) {
|
||||
if (message.type === 'assistant' && message.error) {
|
||||
return classifySdkError(message.error, authType);
|
||||
}
|
||||
if (message.type === 'result') {
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
logger.info(`${authType} OK`);
|
||||
return ok(undefined);
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
const retryable = isRetryableError(error instanceof Error ? error : new Error(message));
|
||||
|
||||
return err(
|
||||
new PentestError(
|
||||
retryable
|
||||
? `Failed to reach Anthropic API. Check your network connection.`
|
||||
: `${authType} validation failed: ${message}`,
|
||||
retryable ? 'network' : 'config',
|
||||
retryable,
|
||||
{ authType },
|
||||
retryable ? undefined : ErrorCode.AUTH_FAILED,
|
||||
),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// === Target URL Validation ===
|
||||
|
||||
/** HTTP HEAD with TLS verification disabled — we check reachability, not certificate validity. */
|
||||
function httpHead(url: string, timeoutMs: number): Promise<number> {
|
||||
return new Promise((resolve, reject) => {
|
||||
const parsed = new URL(url);
|
||||
const isHttps = parsed.protocol === 'https:';
|
||||
const transport = isHttps ? https : http;
|
||||
|
||||
const req = transport.request(
|
||||
url,
|
||||
{
|
||||
method: 'HEAD',
|
||||
timeout: timeoutMs,
|
||||
...(isHttps && { rejectUnauthorized: false }),
|
||||
},
|
||||
(res) => {
|
||||
res.resume();
|
||||
resolve(res.statusCode ?? 0);
|
||||
},
|
||||
);
|
||||
|
||||
req.on('timeout', () => {
|
||||
req.destroy();
|
||||
reject(new Error(`Connection timed out after ${timeoutMs}ms`));
|
||||
});
|
||||
req.on('error', reject);
|
||||
req.end();
|
||||
});
|
||||
}
|
||||
|
||||
/** Check that the target URL is reachable from inside the container. */
|
||||
async function validateTargetUrl(targetUrl: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
|
||||
logger.info('Checking target URL reachability...', { targetUrl });
|
||||
|
||||
// 1. Parse URL
|
||||
let parsed: URL;
|
||||
try {
|
||||
parsed = new URL(targetUrl);
|
||||
} catch {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Invalid target URL: ${targetUrl}`,
|
||||
'config',
|
||||
false,
|
||||
{ targetUrl },
|
||||
ErrorCode.TARGET_UNREACHABLE,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// 2. DNS lookup — detect loopback addresses early for a better hint
|
||||
const hostname = parsed.hostname;
|
||||
let resolvedAddress: string | undefined;
|
||||
try {
|
||||
const result = await lookup(hostname);
|
||||
resolvedAddress = result.address;
|
||||
} catch {
|
||||
return err(
|
||||
new PentestError(
|
||||
`Target URL ${targetUrl} is not reachable. Verify the URL is correct and the site is up.`,
|
||||
'network',
|
||||
false,
|
||||
{ targetUrl, hostname },
|
||||
ErrorCode.TARGET_UNREACHABLE,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
// 3. HTTP reachability check
|
||||
try {
|
||||
await httpHead(targetUrl, TARGET_URL_TIMEOUT_MS);
|
||||
|
||||
logger.info('Target URL OK');
|
||||
return ok(undefined);
|
||||
} catch (error) {
|
||||
const isLoopback = isLoopbackAddress(resolvedAddress);
|
||||
const detail = error instanceof Error ? error.message : String(error);
|
||||
|
||||
if (isLoopback) {
|
||||
const suggestion = targetUrl.replace(hostname, 'host.docker.internal');
|
||||
return err(
|
||||
new PentestError(
|
||||
`Target URL ${targetUrl} resolves to ${resolvedAddress} (loopback) and is not reachable. ` +
|
||||
`For local services, use host.docker.internal instead of ${hostname} (e.g., ${suggestion})`,
|
||||
'network',
|
||||
false,
|
||||
{ targetUrl, resolvedAddress, hostname },
|
||||
ErrorCode.TARGET_UNREACHABLE,
|
||||
),
|
||||
);
|
||||
}
|
||||
|
||||
return err(
|
||||
new PentestError(
|
||||
`Target URL ${targetUrl} is not reachable: ${detail}`,
|
||||
'network',
|
||||
false,
|
||||
{ targetUrl, resolvedAddress },
|
||||
ErrorCode.TARGET_UNREACHABLE,
|
||||
),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
// === Preflight Orchestrator ===
|
||||
|
||||
/**
|
||||
* Run all preflight checks sequentially (cheapest first).
|
||||
*
|
||||
* 1. Repository path exists and contains .git
|
||||
* 2. Config file parses and validates (if configPath provided)
|
||||
* 3. Credentials validate (API key, OAuth, or router mode)
|
||||
* 4. Target URL is reachable from the container
|
||||
*
|
||||
* Returns on first failure.
|
||||
*/
|
||||
export async function runPreflightChecks(
|
||||
targetUrl: string,
|
||||
repoPath: string,
|
||||
configPath: string | undefined,
|
||||
logger: ActivityLogger,
|
||||
): Promise<Result<void, PentestError>> {
|
||||
// 1. Repository check (free — filesystem only)
|
||||
const repoResult = await validateRepo(repoPath, logger);
|
||||
if (!repoResult.ok) {
|
||||
return repoResult;
|
||||
}
|
||||
|
||||
// 2. Config check (free — filesystem + CPU)
|
||||
if (configPath) {
|
||||
const configResult = await validateConfig(configPath, logger);
|
||||
if (!configResult.ok) {
|
||||
return configResult;
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Credential check (cheap — 1 SDK round-trip)
|
||||
const credResult = await validateCredentials(logger);
|
||||
if (!credResult.ok) {
|
||||
return credResult;
|
||||
}
|
||||
|
||||
// 4. Target URL reachability check (cheap — 1 HTTP round-trip)
|
||||
const urlResult = await validateTargetUrl(targetUrl, logger);
|
||||
if (!urlResult.ok) {
|
||||
return urlResult;
|
||||
}
|
||||
|
||||
logger.info('All preflight checks passed');
|
||||
return ok(undefined);
|
||||
}
|
||||
@@ -0,0 +1,267 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { fs, path } from 'zx';
|
||||
import { PROMPTS_DIR } from '../paths.js';
|
||||
import { PLAYWRIGHT_SESSION_MAPPING } from '../session-manager.js';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import type { Authentication, DistributedConfig } from '../types/config.js';
|
||||
import { handlePromptError, PentestError } from './error-handling.js';
|
||||
|
||||
interface PromptVariables {
|
||||
webUrl: string;
|
||||
repoPath: string;
|
||||
PLAYWRIGHT_SESSION?: string;
|
||||
}
|
||||
|
||||
interface IncludeReplacement {
|
||||
placeholder: string;
|
||||
content: string;
|
||||
}
|
||||
|
||||
// Pure function: Build complete login instructions from config
|
||||
async function buildLoginInstructions(authentication: Authentication, logger: ActivityLogger): Promise<string> {
|
||||
try {
|
||||
// 1. Load the login instructions template
|
||||
const loginInstructionsPath = path.join(PROMPTS_DIR, 'shared', 'login-instructions.txt');
|
||||
|
||||
if (!(await fs.pathExists(loginInstructionsPath))) {
|
||||
throw new PentestError('Login instructions template not found', 'filesystem', false, { loginInstructionsPath });
|
||||
}
|
||||
|
||||
const fullTemplate = await fs.readFile(loginInstructionsPath, 'utf8');
|
||||
|
||||
const getSection = (content: string, sectionName: string): string => {
|
||||
const regex = new RegExp(`<!-- BEGIN:${sectionName} -->([\\s\\S]*?)<!-- END:${sectionName} -->`, 'g');
|
||||
const match = regex.exec(content);
|
||||
return match?.[1]?.trim() ?? '';
|
||||
};
|
||||
|
||||
// 2. Extract sections based on login type
|
||||
const loginType = authentication.login_type?.toUpperCase();
|
||||
let loginInstructions = '';
|
||||
|
||||
const commonSection = getSection(fullTemplate, 'COMMON');
|
||||
const authSection = loginType ? getSection(fullTemplate, loginType) : ''; // FORM or SSO
|
||||
const verificationSection = getSection(fullTemplate, 'VERIFICATION');
|
||||
|
||||
// 3. Assemble instructions from sections (fallback to full template if markers missing)
|
||||
if (!commonSection && !authSection && !verificationSection) {
|
||||
logger.warn('Section markers not found, using full login instructions template');
|
||||
loginInstructions = fullTemplate;
|
||||
} else {
|
||||
loginInstructions = [commonSection, authSection, verificationSection].filter((section) => section).join('\n\n');
|
||||
}
|
||||
|
||||
// 4. Interpolate login flow and credential placeholders
|
||||
let userInstructions = (authentication.login_flow ?? []).join('\n');
|
||||
|
||||
if (authentication.credentials) {
|
||||
if (authentication.credentials.username) {
|
||||
userInstructions = userInstructions.replace(/\$username/g, authentication.credentials.username);
|
||||
}
|
||||
if (authentication.credentials.password) {
|
||||
userInstructions = userInstructions.replace(/\$password/g, authentication.credentials.password);
|
||||
}
|
||||
if (authentication.credentials.totp_secret) {
|
||||
userInstructions = userInstructions.replace(
|
||||
/\$totp/g,
|
||||
`generated TOTP code using secret "${authentication.credentials.totp_secret}"`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
loginInstructions = loginInstructions.replace(/{{user_instructions}}/g, userInstructions);
|
||||
|
||||
// 5. Replace TOTP secret placeholder if present in template
|
||||
if (authentication.credentials?.totp_secret) {
|
||||
loginInstructions = loginInstructions.replace(/{{totp_secret}}/g, authentication.credentials.totp_secret);
|
||||
}
|
||||
|
||||
return loginInstructions;
|
||||
} catch (error) {
|
||||
if (error instanceof PentestError) {
|
||||
throw error;
|
||||
}
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
throw new PentestError(`Failed to build login instructions: ${errMsg}`, 'config', false, {
|
||||
authentication,
|
||||
originalError: errMsg,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Pure function: Process @include() directives
|
||||
async function processIncludes(content: string, baseDir: string): Promise<string> {
|
||||
const includeRegex = /@include\(([^)]+)\)/g;
|
||||
const resolvedBase = path.resolve(baseDir);
|
||||
|
||||
const replacements: IncludeReplacement[] = await Promise.all(
|
||||
Array.from(content.matchAll(includeRegex)).map(async (match) => {
|
||||
const rawPath = match[1] ?? '';
|
||||
const includePath = path.resolve(baseDir, rawPath);
|
||||
if (!includePath.startsWith(resolvedBase + path.sep) && includePath !== resolvedBase) {
|
||||
throw new PentestError(`Path traversal detected in @include(): ${rawPath}`, 'prompt', false, {
|
||||
includePath,
|
||||
baseDir: resolvedBase,
|
||||
});
|
||||
}
|
||||
const sharedContent = await fs.readFile(includePath, 'utf8');
|
||||
return {
|
||||
placeholder: match[0],
|
||||
content: sharedContent,
|
||||
};
|
||||
}),
|
||||
);
|
||||
|
||||
for (const replacement of replacements) {
|
||||
content = content.replace(replacement.placeholder, replacement.content);
|
||||
}
|
||||
return content;
|
||||
}
|
||||
|
||||
function buildAuthContext(config: DistributedConfig | null): string {
|
||||
if (!config?.authentication) {
|
||||
return 'No authentication configured - unauthenticated testing only';
|
||||
}
|
||||
|
||||
const auth = config.authentication;
|
||||
const lines = [
|
||||
`- Login type: ${auth.login_type.toUpperCase()}`,
|
||||
`- Username: ${auth.credentials.username}`,
|
||||
`- Login URL: ${auth.login_url}`,
|
||||
];
|
||||
|
||||
if (auth.credentials?.totp_secret) {
|
||||
lines.push('- MFA: TOTP enabled');
|
||||
}
|
||||
|
||||
return lines.join('\n');
|
||||
}
|
||||
|
||||
// Pure function: Variable interpolation
|
||||
async function interpolateVariables(
|
||||
template: string,
|
||||
variables: PromptVariables,
|
||||
config: DistributedConfig | null = null,
|
||||
logger: ActivityLogger,
|
||||
): Promise<string> {
|
||||
try {
|
||||
if (!template || typeof template !== 'string') {
|
||||
throw new PentestError('Template must be a non-empty string', 'validation', false, {
|
||||
templateType: typeof template,
|
||||
templateLength: template?.length,
|
||||
});
|
||||
}
|
||||
|
||||
if (!variables || !variables.webUrl || !variables.repoPath) {
|
||||
throw new PentestError('Variables must include webUrl and repoPath', 'validation', false, {
|
||||
variables: Object.keys(variables || {}),
|
||||
});
|
||||
}
|
||||
|
||||
let result = template
|
||||
.replace(/{{WEB_URL}}/g, variables.webUrl)
|
||||
.replace(/{{REPO_PATH}}/g, variables.repoPath)
|
||||
.replace(/{{PLAYWRIGHT_SESSION}}/g, variables.PLAYWRIGHT_SESSION || 'agent1')
|
||||
.replace(/{{AUTH_CONTEXT}}/g, buildAuthContext(config))
|
||||
.replace(/{{DESCRIPTION}}/g, config?.description ? `Description: ${config.description}` : '');
|
||||
|
||||
if (config) {
|
||||
// Handle rules section - if both are empty, use cleaner messaging
|
||||
const hasAvoidRules = config.avoid && config.avoid.length > 0;
|
||||
const hasFocusRules = config.focus && config.focus.length > 0;
|
||||
|
||||
if (!hasAvoidRules && !hasFocusRules) {
|
||||
// Replace the entire rules section with a clean message
|
||||
const cleanRulesSection = '<rules>\nNo specific rules or focus areas provided for this test.\n</rules>';
|
||||
result = result.replace(/<rules>[\s\S]*?<\/rules>/g, cleanRulesSection);
|
||||
} else {
|
||||
const avoidRules = hasAvoidRules ? config.avoid?.map((r) => `- ${r.description}`).join('\n') : 'None';
|
||||
const focusRules = hasFocusRules ? config.focus?.map((r) => `- ${r.description}`).join('\n') : 'None';
|
||||
|
||||
result = result.replace(/{{RULES_AVOID}}/g, avoidRules).replace(/{{RULES_FOCUS}}/g, focusRules);
|
||||
}
|
||||
|
||||
// Extract and inject login instructions from config
|
||||
if (config.authentication?.login_flow) {
|
||||
const loginInstructions = await buildLoginInstructions(config.authentication, logger);
|
||||
result = result.replace(/{{LOGIN_INSTRUCTIONS}}/g, loginInstructions);
|
||||
} else {
|
||||
result = result.replace(/{{LOGIN_INSTRUCTIONS}}/g, '');
|
||||
}
|
||||
} else {
|
||||
// Replace the entire rules section with a clean message when no config provided
|
||||
const cleanRulesSection = '<rules>\nNo specific rules or focus areas provided for this test.\n</rules>';
|
||||
result = result.replace(/<rules>[\s\S]*?<\/rules>/g, cleanRulesSection);
|
||||
result = result.replace(/{{LOGIN_INSTRUCTIONS}}/g, '');
|
||||
}
|
||||
|
||||
// Validate that all placeholders have been replaced (excluding instructional text)
|
||||
const remainingPlaceholders = result.match(/\{\{[^}]+\}\}/g);
|
||||
if (remainingPlaceholders) {
|
||||
logger.warn(`Found unresolved placeholders in prompt: ${remainingPlaceholders.join(', ')}`);
|
||||
}
|
||||
|
||||
return result;
|
||||
} catch (error) {
|
||||
if (error instanceof PentestError) {
|
||||
throw error;
|
||||
}
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
throw new PentestError(`Variable interpolation failed: ${errMsg}`, 'prompt', false, { originalError: errMsg });
|
||||
}
|
||||
}
|
||||
|
||||
// Pure function: Load and interpolate prompt template
|
||||
export async function loadPrompt(
|
||||
promptName: string,
|
||||
variables: PromptVariables,
|
||||
config: DistributedConfig | null = null,
|
||||
pipelineTestingMode: boolean = false,
|
||||
logger: ActivityLogger,
|
||||
): Promise<string> {
|
||||
try {
|
||||
// 1. Resolve prompt file path
|
||||
const promptsDir = pipelineTestingMode ? path.join(PROMPTS_DIR, 'pipeline-testing') : PROMPTS_DIR;
|
||||
const promptPath = path.join(promptsDir, `${promptName}.txt`);
|
||||
|
||||
if (pipelineTestingMode) {
|
||||
logger.info(`Using pipeline testing prompt: ${promptPath}`);
|
||||
}
|
||||
|
||||
if (!(await fs.pathExists(promptPath))) {
|
||||
throw new PentestError(`Prompt file not found: ${promptPath}`, 'prompt', false, { promptName, promptPath });
|
||||
}
|
||||
|
||||
// 2. Assign Playwright session based on agent name
|
||||
const enhancedVariables: PromptVariables = { ...variables };
|
||||
|
||||
const session = PLAYWRIGHT_SESSION_MAPPING[promptName as keyof typeof PLAYWRIGHT_SESSION_MAPPING];
|
||||
if (session) {
|
||||
enhancedVariables.PLAYWRIGHT_SESSION = session;
|
||||
logger.info(`Assigned ${promptName} -> ${enhancedVariables.PLAYWRIGHT_SESSION}`);
|
||||
} else {
|
||||
enhancedVariables.PLAYWRIGHT_SESSION = 'agent1';
|
||||
logger.warn(`Unknown agent ${promptName}, using fallback -> ${enhancedVariables.PLAYWRIGHT_SESSION}`);
|
||||
}
|
||||
|
||||
// 3. Read template file
|
||||
let template = await fs.readFile(promptPath, 'utf8');
|
||||
|
||||
// 4. Process @include directives
|
||||
template = await processIncludes(template, promptsDir);
|
||||
|
||||
// 5. Interpolate variables and return final prompt
|
||||
return await interpolateVariables(template, enhancedVariables, config, logger);
|
||||
} catch (error) {
|
||||
if (error instanceof PentestError) {
|
||||
throw error;
|
||||
}
|
||||
const promptError = handlePromptError(promptName, error as Error);
|
||||
throw promptError.error;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,307 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { fs, path } from 'zx';
|
||||
import type { ExploitationDecision, VulnType } from '../types/agents.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { err, ok, type Result } from '../types/result.js';
|
||||
import { asyncPipe } from '../utils/functional.js';
|
||||
import { PentestError } from './error-handling.js';
|
||||
|
||||
export type { ExploitationDecision, VulnType } from '../types/agents.js';
|
||||
|
||||
interface VulnTypeConfigItem {
|
||||
deliverable: string;
|
||||
queue: string;
|
||||
}
|
||||
|
||||
type VulnTypeConfig = Record<VulnType, VulnTypeConfigItem>;
|
||||
|
||||
type ErrorMessageResolver = string | ((existence: FileExistence) => string);
|
||||
|
||||
interface ValidationRule {
|
||||
predicate: (existence: FileExistence) => boolean;
|
||||
errorMessage: ErrorMessageResolver;
|
||||
retryable: boolean;
|
||||
}
|
||||
|
||||
interface FileExistence {
|
||||
deliverableExists: boolean;
|
||||
queueExists: boolean;
|
||||
}
|
||||
|
||||
interface PathsBase {
|
||||
vulnType: VulnType;
|
||||
deliverable: string;
|
||||
queue: string;
|
||||
sourceDir: string;
|
||||
}
|
||||
|
||||
interface PathsWithExistence extends PathsBase {
|
||||
existence: FileExistence;
|
||||
}
|
||||
|
||||
interface PathsWithQueue extends PathsWithExistence {
|
||||
queueData: QueueData;
|
||||
}
|
||||
|
||||
interface PathsWithError {
|
||||
error: PentestError;
|
||||
}
|
||||
|
||||
interface QueueData {
|
||||
vulnerabilities: unknown[];
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
interface QueueValidationResult {
|
||||
valid: boolean;
|
||||
data: QueueData | null;
|
||||
error: string | null;
|
||||
}
|
||||
|
||||
/**
|
||||
* Result type for safe validation - explicit error handling.
|
||||
*/
|
||||
export type SafeValidationResult = Result<ExploitationDecision, PentestError>;
|
||||
|
||||
// Vulnerability type configuration as immutable data
|
||||
const VULN_TYPE_CONFIG: VulnTypeConfig = Object.freeze({
|
||||
injection: Object.freeze({
|
||||
deliverable: 'injection_analysis_deliverable.md',
|
||||
queue: 'injection_exploitation_queue.json',
|
||||
}),
|
||||
xss: Object.freeze({
|
||||
deliverable: 'xss_analysis_deliverable.md',
|
||||
queue: 'xss_exploitation_queue.json',
|
||||
}),
|
||||
auth: Object.freeze({
|
||||
deliverable: 'auth_analysis_deliverable.md',
|
||||
queue: 'auth_exploitation_queue.json',
|
||||
}),
|
||||
ssrf: Object.freeze({
|
||||
deliverable: 'ssrf_analysis_deliverable.md',
|
||||
queue: 'ssrf_exploitation_queue.json',
|
||||
}),
|
||||
authz: Object.freeze({
|
||||
deliverable: 'authz_analysis_deliverable.md',
|
||||
queue: 'authz_exploitation_queue.json',
|
||||
}),
|
||||
}) as VulnTypeConfig;
|
||||
|
||||
// Pure function to create validation rule
|
||||
function createValidationRule(
|
||||
predicate: (existence: FileExistence) => boolean,
|
||||
errorMessage: ErrorMessageResolver,
|
||||
retryable: boolean = true,
|
||||
): ValidationRule {
|
||||
return Object.freeze({ predicate, errorMessage, retryable });
|
||||
}
|
||||
|
||||
// Symmetric deliverable rules: queue and deliverable must exist together (prevents partial analysis from triggering exploitation)
|
||||
const fileExistenceRules: readonly ValidationRule[] = Object.freeze([
|
||||
createValidationRule(
|
||||
({ deliverableExists, queueExists }) => deliverableExists && queueExists,
|
||||
getExistenceErrorMessage,
|
||||
),
|
||||
]);
|
||||
|
||||
// Generate appropriate error message based on which files are missing
|
||||
function getExistenceErrorMessage(existence: FileExistence): string {
|
||||
const { deliverableExists, queueExists } = existence;
|
||||
|
||||
if (!deliverableExists && !queueExists) {
|
||||
return 'Analysis failed: Neither deliverable nor queue file exists. Analysis agent must create both files.';
|
||||
}
|
||||
if (!queueExists) {
|
||||
return 'Analysis incomplete: Deliverable exists but queue file missing. Analysis agent must create both files.';
|
||||
}
|
||||
return 'Analysis incomplete: Queue exists but deliverable file missing. Analysis agent must create both files.';
|
||||
}
|
||||
|
||||
// Pure function to create file paths
|
||||
const createPaths = (vulnType: VulnType, sourceDir: string): PathsBase | PathsWithError => {
|
||||
const config = VULN_TYPE_CONFIG[vulnType];
|
||||
if (!config) {
|
||||
return {
|
||||
error: new PentestError(`Unknown vulnerability type: ${vulnType}`, 'validation', false, { vulnType }),
|
||||
};
|
||||
}
|
||||
|
||||
return Object.freeze({
|
||||
vulnType,
|
||||
deliverable: path.join(sourceDir, 'deliverables', config.deliverable),
|
||||
queue: path.join(sourceDir, 'deliverables', config.queue),
|
||||
sourceDir,
|
||||
});
|
||||
};
|
||||
|
||||
// Pure function to check file existence
|
||||
const checkFileExistence = async (paths: PathsBase | PathsWithError): Promise<PathsWithExistence | PathsWithError> => {
|
||||
if ('error' in paths) return paths;
|
||||
|
||||
const [deliverableExists, queueExists] = await Promise.all([
|
||||
fs.pathExists(paths.deliverable),
|
||||
fs.pathExists(paths.queue),
|
||||
]);
|
||||
|
||||
return Object.freeze({
|
||||
...paths,
|
||||
existence: Object.freeze({ deliverableExists, queueExists }),
|
||||
});
|
||||
};
|
||||
|
||||
// Validates deliverable/queue symmetry - both must exist or neither
|
||||
const validateExistenceRules = (
|
||||
pathsWithExistence: PathsWithExistence | PathsWithError,
|
||||
): PathsWithExistence | PathsWithError => {
|
||||
if ('error' in pathsWithExistence) return pathsWithExistence;
|
||||
|
||||
const { existence, vulnType } = pathsWithExistence;
|
||||
|
||||
// Find the first rule that fails
|
||||
const failedRule = fileExistenceRules.find((rule) => !rule.predicate(existence));
|
||||
|
||||
if (failedRule) {
|
||||
const message =
|
||||
typeof failedRule.errorMessage === 'function' ? failedRule.errorMessage(existence) : failedRule.errorMessage;
|
||||
|
||||
return {
|
||||
error: new PentestError(
|
||||
`${message} (${vulnType})`,
|
||||
'validation',
|
||||
failedRule.retryable,
|
||||
{
|
||||
vulnType,
|
||||
deliverablePath: pathsWithExistence.deliverable,
|
||||
queuePath: pathsWithExistence.queue,
|
||||
existence,
|
||||
},
|
||||
ErrorCode.DELIVERABLE_NOT_FOUND,
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
return pathsWithExistence;
|
||||
};
|
||||
|
||||
// Pure function to validate queue structure
|
||||
const validateQueueStructure = (content: string): QueueValidationResult => {
|
||||
try {
|
||||
const parsed = JSON.parse(content) as unknown;
|
||||
const isValid =
|
||||
typeof parsed === 'object' &&
|
||||
parsed !== null &&
|
||||
'vulnerabilities' in parsed &&
|
||||
Array.isArray((parsed as QueueData).vulnerabilities);
|
||||
|
||||
return Object.freeze({
|
||||
valid: isValid,
|
||||
data: isValid ? (parsed as QueueData) : null,
|
||||
error: null,
|
||||
});
|
||||
} catch (parseError) {
|
||||
return Object.freeze({
|
||||
valid: false,
|
||||
data: null,
|
||||
error: parseError instanceof Error ? parseError.message : String(parseError),
|
||||
});
|
||||
}
|
||||
};
|
||||
|
||||
// Queue parse failures are retryable - agent can fix malformed JSON on retry
|
||||
const validateQueueContent = async (
|
||||
pathsWithExistence: PathsWithExistence | PathsWithError,
|
||||
): Promise<PathsWithQueue | PathsWithError> => {
|
||||
if ('error' in pathsWithExistence) return pathsWithExistence;
|
||||
|
||||
try {
|
||||
const queueContent = await fs.readFile(pathsWithExistence.queue, 'utf8');
|
||||
const queueValidation = validateQueueStructure(queueContent);
|
||||
|
||||
if (!queueValidation.valid) {
|
||||
// Rule 6: Both exist, queue invalid
|
||||
return {
|
||||
error: new PentestError(
|
||||
queueValidation.error
|
||||
? `Queue validation failed for ${pathsWithExistence.vulnType}: Invalid JSON structure. Analysis agent must fix queue format.`
|
||||
: `Queue validation failed for ${pathsWithExistence.vulnType}: Missing or invalid 'vulnerabilities' array. Analysis agent must fix queue structure.`,
|
||||
'validation',
|
||||
true, // retryable
|
||||
{
|
||||
vulnType: pathsWithExistence.vulnType,
|
||||
queuePath: pathsWithExistence.queue,
|
||||
originalError: queueValidation.error,
|
||||
queueStructure: queueValidation.data ? Object.keys(queueValidation.data) : [],
|
||||
},
|
||||
),
|
||||
};
|
||||
}
|
||||
|
||||
return Object.freeze({
|
||||
...pathsWithExistence,
|
||||
queueData: queueValidation.data as QueueData,
|
||||
});
|
||||
} catch (readError) {
|
||||
return {
|
||||
error: new PentestError(
|
||||
`Failed to read queue file for ${pathsWithExistence.vulnType}: ${readError instanceof Error ? readError.message : String(readError)}`,
|
||||
'filesystem',
|
||||
false,
|
||||
{
|
||||
vulnType: pathsWithExistence.vulnType,
|
||||
queuePath: pathsWithExistence.queue,
|
||||
originalError: readError instanceof Error ? readError.message : String(readError),
|
||||
},
|
||||
),
|
||||
};
|
||||
}
|
||||
};
|
||||
|
||||
// Final decision: skip if queue says no vulns, proceed if vulns found, error otherwise
|
||||
const determineExploitationDecision = (validatedData: PathsWithQueue | PathsWithError): ExploitationDecision => {
|
||||
if ('error' in validatedData) {
|
||||
throw validatedData.error;
|
||||
}
|
||||
|
||||
const hasVulnerabilities = validatedData.queueData.vulnerabilities.length > 0;
|
||||
|
||||
// Rule 4: Both exist, queue valid and populated
|
||||
// Rule 5: Both exist, queue valid but empty
|
||||
return Object.freeze({
|
||||
shouldExploit: hasVulnerabilities,
|
||||
shouldRetry: false,
|
||||
vulnerabilityCount: validatedData.queueData.vulnerabilities.length,
|
||||
vulnType: validatedData.vulnType,
|
||||
});
|
||||
};
|
||||
|
||||
// Main functional validation pipeline
|
||||
export async function validateQueueAndDeliverable(
|
||||
vulnType: VulnType,
|
||||
sourceDir: string,
|
||||
): Promise<ExploitationDecision> {
|
||||
return asyncPipe<ExploitationDecision>(
|
||||
createPaths(vulnType, sourceDir),
|
||||
checkFileExistence,
|
||||
validateExistenceRules,
|
||||
validateQueueContent,
|
||||
determineExploitationDecision,
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Safely validate queue and deliverable files.
|
||||
* Returns Result<ExploitationDecision, PentestError> for explicit error handling.
|
||||
*/
|
||||
export async function validateQueueSafe(vulnType: VulnType, sourceDir: string): Promise<SafeValidationResult> {
|
||||
try {
|
||||
const result = await validateQueueAndDeliverable(vulnType, sourceDir);
|
||||
return ok(result);
|
||||
} catch (error) {
|
||||
return err(error as PentestError);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,154 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { fs, path } from 'zx';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { PentestError } from './error-handling.js';
|
||||
|
||||
interface DeliverableFile {
|
||||
name: string;
|
||||
path: string;
|
||||
required: boolean;
|
||||
}
|
||||
|
||||
// Pure function: Assemble final report from specialist deliverables
|
||||
export async function assembleFinalReport(sourceDir: string, logger: ActivityLogger): Promise<string> {
|
||||
const deliverableFiles: DeliverableFile[] = [
|
||||
{ name: 'Injection', path: 'injection_exploitation_evidence.md', required: false },
|
||||
{ name: 'XSS', path: 'xss_exploitation_evidence.md', required: false },
|
||||
{ name: 'Authentication', path: 'auth_exploitation_evidence.md', required: false },
|
||||
{ name: 'SSRF', path: 'ssrf_exploitation_evidence.md', required: false },
|
||||
{ name: 'Authorization', path: 'authz_exploitation_evidence.md', required: false },
|
||||
];
|
||||
|
||||
const sections: string[] = [];
|
||||
|
||||
for (const file of deliverableFiles) {
|
||||
const filePath = path.join(sourceDir, 'deliverables', file.path);
|
||||
try {
|
||||
if (await fs.pathExists(filePath)) {
|
||||
const content = await fs.readFile(filePath, 'utf8');
|
||||
sections.push(content);
|
||||
logger.info(`Added ${file.name} findings`);
|
||||
} else if (file.required) {
|
||||
throw new PentestError(
|
||||
`Required deliverable file not found: ${file.path}`,
|
||||
'filesystem',
|
||||
false,
|
||||
{ deliverableFile: file.path, sourceDir },
|
||||
ErrorCode.DELIVERABLE_NOT_FOUND,
|
||||
);
|
||||
} else {
|
||||
logger.info(`No ${file.name} deliverable found`);
|
||||
}
|
||||
} catch (error) {
|
||||
if (file.required) {
|
||||
throw error;
|
||||
}
|
||||
const err = error as Error;
|
||||
logger.warn(`Could not read ${file.path}: ${err.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
const finalContent = sections.join('\n\n');
|
||||
const deliverablesDir = path.join(sourceDir, 'deliverables');
|
||||
const finalReportPath = path.join(deliverablesDir, 'comprehensive_security_assessment_report.md');
|
||||
|
||||
try {
|
||||
// Ensure deliverables directory exists
|
||||
await fs.ensureDir(deliverablesDir);
|
||||
await fs.writeFile(finalReportPath, finalContent);
|
||||
logger.info(`Final report assembled at ${finalReportPath}`);
|
||||
} catch (error) {
|
||||
const err = error as Error;
|
||||
throw new PentestError(`Failed to write final report: ${err.message}`, 'filesystem', false, {
|
||||
finalReportPath,
|
||||
originalError: err.message,
|
||||
});
|
||||
}
|
||||
|
||||
return finalContent;
|
||||
}
|
||||
|
||||
/**
|
||||
* Inject model information into the final security report.
|
||||
* Reads session.json to get the model(s) used, then injects a "Model:" line
|
||||
* into the Executive Summary section of the report.
|
||||
*/
|
||||
export async function injectModelIntoReport(
|
||||
repoPath: string,
|
||||
outputPath: string,
|
||||
logger: ActivityLogger,
|
||||
): Promise<void> {
|
||||
// 1. Read session.json to get model information
|
||||
const sessionJsonPath = path.join(outputPath, 'session.json');
|
||||
|
||||
if (!(await fs.pathExists(sessionJsonPath))) {
|
||||
logger.warn('session.json not found, skipping model injection');
|
||||
return;
|
||||
}
|
||||
|
||||
interface SessionData {
|
||||
metrics: {
|
||||
agents: Record<string, { model?: string }>;
|
||||
};
|
||||
}
|
||||
|
||||
const sessionData: SessionData = await fs.readJson(sessionJsonPath);
|
||||
|
||||
// 2. Extract unique models from all agents
|
||||
const models = new Set<string>();
|
||||
for (const agent of Object.values(sessionData.metrics.agents)) {
|
||||
if (agent.model) {
|
||||
models.add(agent.model);
|
||||
}
|
||||
}
|
||||
|
||||
if (models.size === 0) {
|
||||
logger.warn('No model information found in session.json');
|
||||
return;
|
||||
}
|
||||
|
||||
const modelStr = Array.from(models).join(', ');
|
||||
logger.info(`Injecting model info into report: ${modelStr}`);
|
||||
|
||||
// 3. Read the final report
|
||||
const reportPath = path.join(repoPath, 'deliverables', 'comprehensive_security_assessment_report.md');
|
||||
|
||||
if (!(await fs.pathExists(reportPath))) {
|
||||
logger.warn('Final report not found, skipping model injection');
|
||||
return;
|
||||
}
|
||||
|
||||
let reportContent = await fs.readFile(reportPath, 'utf8');
|
||||
|
||||
// 4. Find and inject model line after "Assessment Date" in Executive Summary
|
||||
// Pattern: "- Assessment Date: <date>" followed by a newline
|
||||
const assessmentDatePattern = /^(- Assessment Date: .+)$/m;
|
||||
const match = reportContent.match(assessmentDatePattern);
|
||||
|
||||
if (match) {
|
||||
// Inject model line after Assessment Date
|
||||
const modelLine = `- Model: ${modelStr}`;
|
||||
reportContent = reportContent.replace(assessmentDatePattern, `$1\n${modelLine}`);
|
||||
logger.info('Model info injected into Executive Summary');
|
||||
} else {
|
||||
// If no Assessment Date line found, try to add after Executive Summary header
|
||||
const execSummaryPattern = /^## Executive Summary$/m;
|
||||
if (reportContent.match(execSummaryPattern)) {
|
||||
// Add model as first item in Executive Summary
|
||||
reportContent = reportContent.replace(execSummaryPattern, `## Executive Summary\n- Model: ${modelStr}`);
|
||||
logger.info('Model info added to Executive Summary header');
|
||||
} else {
|
||||
logger.warn('Could not find Executive Summary section');
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
// 5. Write modified report back
|
||||
await fs.writeFile(reportPath, reportContent);
|
||||
}
|
||||
@@ -0,0 +1,218 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { fs, path } from 'zx';
|
||||
import { validateQueueAndDeliverable } from './services/queue-validation.js';
|
||||
import type { ActivityLogger } from './types/activity-logger.js';
|
||||
import type { AgentDefinition, AgentName, AgentValidator, PlaywrightSession, VulnType } from './types/index.js';
|
||||
|
||||
// Agent definitions according to PRD
|
||||
export const AGENTS: Readonly<Record<AgentName, AgentDefinition>> = Object.freeze({
|
||||
'pre-recon': {
|
||||
name: 'pre-recon',
|
||||
displayName: 'Pre-recon agent',
|
||||
prerequisites: [],
|
||||
promptTemplate: 'pre-recon-code',
|
||||
deliverableFilename: 'code_analysis_deliverable.md',
|
||||
modelTier: 'large',
|
||||
},
|
||||
recon: {
|
||||
name: 'recon',
|
||||
displayName: 'Recon agent',
|
||||
prerequisites: ['pre-recon'],
|
||||
promptTemplate: 'recon',
|
||||
deliverableFilename: 'recon_deliverable.md',
|
||||
},
|
||||
'injection-vuln': {
|
||||
name: 'injection-vuln',
|
||||
displayName: 'Injection vuln agent',
|
||||
prerequisites: ['recon'],
|
||||
promptTemplate: 'vuln-injection',
|
||||
deliverableFilename: 'injection_analysis_deliverable.md',
|
||||
},
|
||||
'xss-vuln': {
|
||||
name: 'xss-vuln',
|
||||
displayName: 'XSS vuln agent',
|
||||
prerequisites: ['recon'],
|
||||
promptTemplate: 'vuln-xss',
|
||||
deliverableFilename: 'xss_analysis_deliverable.md',
|
||||
},
|
||||
'auth-vuln': {
|
||||
name: 'auth-vuln',
|
||||
displayName: 'Auth vuln agent',
|
||||
prerequisites: ['recon'],
|
||||
promptTemplate: 'vuln-auth',
|
||||
deliverableFilename: 'auth_analysis_deliverable.md',
|
||||
},
|
||||
'ssrf-vuln': {
|
||||
name: 'ssrf-vuln',
|
||||
displayName: 'SSRF vuln agent',
|
||||
prerequisites: ['recon'],
|
||||
promptTemplate: 'vuln-ssrf',
|
||||
deliverableFilename: 'ssrf_analysis_deliverable.md',
|
||||
},
|
||||
'authz-vuln': {
|
||||
name: 'authz-vuln',
|
||||
displayName: 'Authz vuln agent',
|
||||
prerequisites: ['recon'],
|
||||
promptTemplate: 'vuln-authz',
|
||||
deliverableFilename: 'authz_analysis_deliverable.md',
|
||||
},
|
||||
'injection-exploit': {
|
||||
name: 'injection-exploit',
|
||||
displayName: 'Injection exploit agent',
|
||||
prerequisites: ['injection-vuln'],
|
||||
promptTemplate: 'exploit-injection',
|
||||
deliverableFilename: 'injection_exploitation_evidence.md',
|
||||
},
|
||||
'xss-exploit': {
|
||||
name: 'xss-exploit',
|
||||
displayName: 'XSS exploit agent',
|
||||
prerequisites: ['xss-vuln'],
|
||||
promptTemplate: 'exploit-xss',
|
||||
deliverableFilename: 'xss_exploitation_evidence.md',
|
||||
},
|
||||
'auth-exploit': {
|
||||
name: 'auth-exploit',
|
||||
displayName: 'Auth exploit agent',
|
||||
prerequisites: ['auth-vuln'],
|
||||
promptTemplate: 'exploit-auth',
|
||||
deliverableFilename: 'auth_exploitation_evidence.md',
|
||||
},
|
||||
'ssrf-exploit': {
|
||||
name: 'ssrf-exploit',
|
||||
displayName: 'SSRF exploit agent',
|
||||
prerequisites: ['ssrf-vuln'],
|
||||
promptTemplate: 'exploit-ssrf',
|
||||
deliverableFilename: 'ssrf_exploitation_evidence.md',
|
||||
},
|
||||
'authz-exploit': {
|
||||
name: 'authz-exploit',
|
||||
displayName: 'Authz exploit agent',
|
||||
prerequisites: ['authz-vuln'],
|
||||
promptTemplate: 'exploit-authz',
|
||||
deliverableFilename: 'authz_exploitation_evidence.md',
|
||||
},
|
||||
report: {
|
||||
name: 'report',
|
||||
displayName: 'Report agent',
|
||||
prerequisites: ['injection-exploit', 'xss-exploit', 'auth-exploit', 'ssrf-exploit', 'authz-exploit'],
|
||||
promptTemplate: 'report-executive',
|
||||
deliverableFilename: 'comprehensive_security_assessment_report.md',
|
||||
modelTier: 'small',
|
||||
},
|
||||
});
|
||||
|
||||
// Phase names for metrics aggregation
|
||||
export type PhaseName = 'pre-recon' | 'recon' | 'vulnerability-analysis' | 'exploitation' | 'reporting';
|
||||
|
||||
// Map agents to their corresponding phases (single source of truth)
|
||||
export const AGENT_PHASE_MAP: Readonly<Record<AgentName, PhaseName>> = Object.freeze({
|
||||
'pre-recon': 'pre-recon',
|
||||
recon: 'recon',
|
||||
'injection-vuln': 'vulnerability-analysis',
|
||||
'xss-vuln': 'vulnerability-analysis',
|
||||
'auth-vuln': 'vulnerability-analysis',
|
||||
'authz-vuln': 'vulnerability-analysis',
|
||||
'ssrf-vuln': 'vulnerability-analysis',
|
||||
'injection-exploit': 'exploitation',
|
||||
'xss-exploit': 'exploitation',
|
||||
'auth-exploit': 'exploitation',
|
||||
'authz-exploit': 'exploitation',
|
||||
'ssrf-exploit': 'exploitation',
|
||||
report: 'reporting',
|
||||
});
|
||||
|
||||
// Factory function for vulnerability queue validators
|
||||
function createVulnValidator(vulnType: VulnType): AgentValidator {
|
||||
return async (sourceDir: string, logger: ActivityLogger): Promise<boolean> => {
|
||||
try {
|
||||
await validateQueueAndDeliverable(vulnType, sourceDir);
|
||||
return true;
|
||||
} catch (error) {
|
||||
const errMsg = error instanceof Error ? error.message : String(error);
|
||||
logger.warn(`Queue validation failed for ${vulnType}: ${errMsg}`);
|
||||
return false;
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
// Factory function for exploit deliverable validators
|
||||
function createExploitValidator(vulnType: VulnType): AgentValidator {
|
||||
return async (sourceDir: string): Promise<boolean> => {
|
||||
const evidenceFile = path.join(sourceDir, 'deliverables', `${vulnType}_exploitation_evidence.md`);
|
||||
return await fs.pathExists(evidenceFile);
|
||||
};
|
||||
}
|
||||
|
||||
// Playwright session mapping - assigns each agent to a specific session for browser isolation
|
||||
// Keys are promptTemplate values from AGENTS registry
|
||||
export const PLAYWRIGHT_SESSION_MAPPING: Record<string, PlaywrightSession> = Object.freeze({
|
||||
// Phase 1: Pre-reconnaissance
|
||||
'pre-recon-code': 'agent1',
|
||||
|
||||
// Phase 2: Reconnaissance
|
||||
recon: 'agent2',
|
||||
|
||||
// Phase 3: Vulnerability Analysis (5 parallel agents)
|
||||
'vuln-injection': 'agent1',
|
||||
'vuln-xss': 'agent2',
|
||||
'vuln-auth': 'agent3',
|
||||
'vuln-ssrf': 'agent4',
|
||||
'vuln-authz': 'agent5',
|
||||
|
||||
// Phase 4: Exploitation (5 parallel agents - same as vuln counterparts)
|
||||
'exploit-injection': 'agent1',
|
||||
'exploit-xss': 'agent2',
|
||||
'exploit-auth': 'agent3',
|
||||
'exploit-ssrf': 'agent4',
|
||||
'exploit-authz': 'agent5',
|
||||
|
||||
// Phase 5: Reporting
|
||||
'report-executive': 'agent3',
|
||||
});
|
||||
|
||||
// Direct agent-to-validator mapping - much simpler than pattern matching
|
||||
export const AGENT_VALIDATORS: Record<AgentName, AgentValidator> = Object.freeze({
|
||||
// Pre-reconnaissance agent - validates the code analysis deliverable created by the agent
|
||||
'pre-recon': async (sourceDir: string): Promise<boolean> => {
|
||||
const codeAnalysisFile = path.join(sourceDir, 'deliverables', 'code_analysis_deliverable.md');
|
||||
return await fs.pathExists(codeAnalysisFile);
|
||||
},
|
||||
|
||||
// Reconnaissance agent
|
||||
recon: async (sourceDir: string): Promise<boolean> => {
|
||||
const reconFile = path.join(sourceDir, 'deliverables', 'recon_deliverable.md');
|
||||
return await fs.pathExists(reconFile);
|
||||
},
|
||||
|
||||
// Vulnerability analysis agents
|
||||
'injection-vuln': createVulnValidator('injection'),
|
||||
'xss-vuln': createVulnValidator('xss'),
|
||||
'auth-vuln': createVulnValidator('auth'),
|
||||
'ssrf-vuln': createVulnValidator('ssrf'),
|
||||
'authz-vuln': createVulnValidator('authz'),
|
||||
|
||||
// Exploitation agents
|
||||
'injection-exploit': createExploitValidator('injection'),
|
||||
'xss-exploit': createExploitValidator('xss'),
|
||||
'auth-exploit': createExploitValidator('auth'),
|
||||
'ssrf-exploit': createExploitValidator('ssrf'),
|
||||
'authz-exploit': createExploitValidator('authz'),
|
||||
|
||||
// Executive report agent
|
||||
report: async (sourceDir: string, logger: ActivityLogger): Promise<boolean> => {
|
||||
const reportFile = path.join(sourceDir, 'deliverables', 'comprehensive_security_assessment_report.md');
|
||||
|
||||
const reportExists = await fs.pathExists(reportFile);
|
||||
|
||||
if (!reportExists) {
|
||||
logger.error('Missing required deliverable: comprehensive_security_assessment_report.md');
|
||||
}
|
||||
|
||||
return reportExists;
|
||||
},
|
||||
});
|
||||
@@ -0,0 +1,646 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Temporal activities for Shannon agent execution.
|
||||
*
|
||||
* Each activity wraps service calls with Temporal-specific concerns:
|
||||
* - Heartbeat loop (2s interval) to signal worker liveness
|
||||
* - Error classification into ApplicationFailure
|
||||
* - Container lifecycle management
|
||||
*
|
||||
* Business logic is delegated to services in src/services/.
|
||||
*/
|
||||
|
||||
import fs from 'node:fs/promises';
|
||||
import path from 'node:path';
|
||||
import { ApplicationFailure, Context, heartbeat } from '@temporalio/activity';
|
||||
import { AuditSession } from '../audit/index.js';
|
||||
import type { ResumeAttempt } from '../audit/metrics-tracker.js';
|
||||
import { copyDeliverablesToAudit, type SessionMetadata } from '../audit/utils.js';
|
||||
import type { WorkflowSummary } from '../audit/workflow-logger.js';
|
||||
import { getContainer, getOrCreateContainer, removeContainer } from '../services/container.js';
|
||||
import { classifyErrorForTemporal, PentestError } from '../services/error-handling.js';
|
||||
import { ExploitationCheckerService } from '../services/exploitation-checker.js';
|
||||
import { executeGitCommandWithRetry } from '../services/git-manager.js';
|
||||
import { runPreflightChecks } from '../services/preflight.js';
|
||||
import type { ExploitationDecision, VulnType } from '../services/queue-validation.js';
|
||||
import { assembleFinalReport, injectModelIntoReport } from '../services/reporting.js';
|
||||
import { AGENTS } from '../session-manager.js';
|
||||
import type { AgentName } from '../types/agents.js';
|
||||
import { ALL_AGENTS } from '../types/agents.js';
|
||||
import { ErrorCode } from '../types/errors.js';
|
||||
import { isErr } from '../types/result.js';
|
||||
import { fileExists, readJson } from '../utils/file-io.js';
|
||||
import { createActivityLogger } from './activity-logger.js';
|
||||
import type { AgentMetrics, ResumeState } from './shared.js';
|
||||
|
||||
// Max lengths to prevent Temporal protobuf buffer overflow
|
||||
const MAX_ERROR_MESSAGE_LENGTH = 2000;
|
||||
const MAX_STACK_TRACE_LENGTH = 1000;
|
||||
|
||||
// Max retries for output validation errors (agent didn't save deliverables)
|
||||
const MAX_OUTPUT_VALIDATION_RETRIES = 3;
|
||||
|
||||
const HEARTBEAT_INTERVAL_MS = 2000;
|
||||
|
||||
/**
|
||||
* Input for all agent activities.
|
||||
*/
|
||||
export interface ActivityInput {
|
||||
webUrl: string;
|
||||
repoPath: string;
|
||||
configPath?: string;
|
||||
outputPath?: string;
|
||||
pipelineTestingMode?: boolean;
|
||||
workflowId: string;
|
||||
sessionId: string;
|
||||
}
|
||||
|
||||
/**
|
||||
* Truncate error message to prevent buffer overflow in Temporal serialization.
|
||||
*/
|
||||
function truncateErrorMessage(message: string): string {
|
||||
if (message.length <= MAX_ERROR_MESSAGE_LENGTH) {
|
||||
return message;
|
||||
}
|
||||
return `${message.slice(0, MAX_ERROR_MESSAGE_LENGTH - 20)}\n[truncated]`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Truncate stack trace on an ApplicationFailure to prevent buffer overflow.
|
||||
*/
|
||||
function truncateStackTrace(failure: ApplicationFailure): void {
|
||||
if (failure.stack && failure.stack.length > MAX_STACK_TRACE_LENGTH) {
|
||||
failure.stack = `${failure.stack.slice(0, MAX_STACK_TRACE_LENGTH)}\n[stack truncated]`;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Build SessionMetadata from ActivityInput.
|
||||
*/
|
||||
function buildSessionMetadata(input: ActivityInput): SessionMetadata {
|
||||
const { webUrl, repoPath, outputPath, sessionId } = input;
|
||||
return {
|
||||
id: sessionId,
|
||||
webUrl,
|
||||
repoPath,
|
||||
...(outputPath && { outputPath }),
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Core activity implementation using services.
|
||||
*
|
||||
* Executes a single agent with:
|
||||
* 1. Heartbeat loop for worker liveness
|
||||
* 2. Container creation/reuse
|
||||
* 3. Service-based agent execution
|
||||
* 4. Error classification for Temporal retry
|
||||
*/
|
||||
async function runAgentActivity(agentName: AgentName, input: ActivityInput): Promise<AgentMetrics> {
|
||||
const { repoPath, configPath, pipelineTestingMode = false, workflowId, webUrl } = input;
|
||||
const startTime = Date.now();
|
||||
const attemptNumber = Context.current().info.attempt;
|
||||
|
||||
// Heartbeat loop - signals worker is alive to Temporal server
|
||||
const heartbeatInterval = setInterval(() => {
|
||||
const elapsed = Math.floor((Date.now() - startTime) / 1000);
|
||||
heartbeat({ agent: agentName, elapsedSeconds: elapsed, attempt: attemptNumber });
|
||||
}, HEARTBEAT_INTERVAL_MS);
|
||||
|
||||
try {
|
||||
const logger = createActivityLogger();
|
||||
|
||||
// 1. Build session metadata and get/create container
|
||||
const sessionMetadata = buildSessionMetadata(input);
|
||||
const container = getOrCreateContainer(workflowId, sessionMetadata);
|
||||
|
||||
// 2. Create audit session for THIS agent execution
|
||||
// NOTE: Each agent needs its own AuditSession because AuditSession uses
|
||||
// instance state (currentAgentName) that cannot be shared across parallel agents
|
||||
const auditSession = new AuditSession(sessionMetadata);
|
||||
await auditSession.initialize(workflowId);
|
||||
|
||||
// 3. Execute agent via service (throws PentestError on failure)
|
||||
const endResult = await container.agentExecution.executeOrThrow(
|
||||
agentName,
|
||||
{
|
||||
webUrl,
|
||||
repoPath,
|
||||
configPath,
|
||||
pipelineTestingMode,
|
||||
attemptNumber,
|
||||
},
|
||||
auditSession,
|
||||
logger,
|
||||
);
|
||||
|
||||
// 4. Return metrics
|
||||
return {
|
||||
durationMs: Date.now() - startTime,
|
||||
inputTokens: null,
|
||||
outputTokens: null,
|
||||
costUsd: endResult.cost_usd,
|
||||
numTurns: null,
|
||||
model: endResult.model,
|
||||
};
|
||||
} catch (error) {
|
||||
// If error is already an ApplicationFailure, re-throw directly
|
||||
if (error instanceof ApplicationFailure) {
|
||||
throw error;
|
||||
}
|
||||
|
||||
// Check if output validation retry limit reached (PentestError with code)
|
||||
if (
|
||||
error instanceof PentestError &&
|
||||
error.code === ErrorCode.OUTPUT_VALIDATION_FAILED &&
|
||||
attemptNumber >= MAX_OUTPUT_VALIDATION_RETRIES
|
||||
) {
|
||||
throw ApplicationFailure.nonRetryable(
|
||||
`Agent ${agentName} failed output validation after ${attemptNumber} attempts`,
|
||||
'OutputValidationError',
|
||||
[{ agentName, attemptNumber, elapsed: Date.now() - startTime }],
|
||||
);
|
||||
}
|
||||
|
||||
// Classify error for Temporal retry behavior
|
||||
const classified = classifyErrorForTemporal(error);
|
||||
const rawMessage = error instanceof Error ? error.message : String(error);
|
||||
const message = truncateErrorMessage(rawMessage);
|
||||
|
||||
if (classified.retryable) {
|
||||
const failure = ApplicationFailure.create({
|
||||
message,
|
||||
type: classified.type,
|
||||
details: [{ agentName, attemptNumber, elapsed: Date.now() - startTime }],
|
||||
});
|
||||
truncateStackTrace(failure);
|
||||
throw failure;
|
||||
} else {
|
||||
const failure = ApplicationFailure.nonRetryable(message, classified.type, [
|
||||
{ agentName, attemptNumber, elapsed: Date.now() - startTime },
|
||||
]);
|
||||
truncateStackTrace(failure);
|
||||
throw failure;
|
||||
}
|
||||
} finally {
|
||||
clearInterval(heartbeatInterval);
|
||||
}
|
||||
}
|
||||
|
||||
export async function runPreReconAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('pre-recon', input);
|
||||
}
|
||||
|
||||
export async function runReconAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('recon', input);
|
||||
}
|
||||
|
||||
export async function runInjectionVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('injection-vuln', input);
|
||||
}
|
||||
|
||||
export async function runXssVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('xss-vuln', input);
|
||||
}
|
||||
|
||||
export async function runAuthVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('auth-vuln', input);
|
||||
}
|
||||
|
||||
export async function runSsrfVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('ssrf-vuln', input);
|
||||
}
|
||||
|
||||
export async function runAuthzVulnAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('authz-vuln', input);
|
||||
}
|
||||
|
||||
export async function runInjectionExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('injection-exploit', input);
|
||||
}
|
||||
|
||||
export async function runXssExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('xss-exploit', input);
|
||||
}
|
||||
|
||||
export async function runAuthExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('auth-exploit', input);
|
||||
}
|
||||
|
||||
export async function runSsrfExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('ssrf-exploit', input);
|
||||
}
|
||||
|
||||
export async function runAuthzExploitAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('authz-exploit', input);
|
||||
}
|
||||
|
||||
export async function runReportAgent(input: ActivityInput): Promise<AgentMetrics> {
|
||||
return runAgentActivity('report', input);
|
||||
}
|
||||
|
||||
/**
|
||||
* Preflight validation activity.
|
||||
*
|
||||
* Runs cheap checks before any agent execution:
|
||||
* 1. Repository path exists with .git
|
||||
* 2. Config file validates (if provided)
|
||||
* 3. Credential validation (API key, OAuth, or router mode)
|
||||
* 4. Target URL reachable from the container
|
||||
*
|
||||
* NOT using runAgentActivity — preflight doesn't run an agent via the SDK.
|
||||
*/
|
||||
export async function runPreflightValidation(input: ActivityInput): Promise<void> {
|
||||
const startTime = Date.now();
|
||||
const attemptNumber = Context.current().info.attempt;
|
||||
|
||||
const heartbeatInterval = setInterval(() => {
|
||||
const elapsed = Math.floor((Date.now() - startTime) / 1000);
|
||||
heartbeat({ phase: 'preflight', elapsedSeconds: elapsed, attempt: attemptNumber });
|
||||
}, HEARTBEAT_INTERVAL_MS);
|
||||
|
||||
try {
|
||||
const logger = createActivityLogger();
|
||||
logger.info('Running preflight validation...', { attempt: attemptNumber });
|
||||
|
||||
const result = await runPreflightChecks(input.webUrl, input.repoPath, input.configPath, logger);
|
||||
|
||||
if (isErr(result)) {
|
||||
const classified = classifyErrorForTemporal(result.error);
|
||||
const message = truncateErrorMessage(result.error.message);
|
||||
|
||||
if (classified.retryable) {
|
||||
const failure = ApplicationFailure.create({
|
||||
message,
|
||||
type: classified.type,
|
||||
details: [{ phase: 'preflight', attemptNumber, elapsed: Date.now() - startTime }],
|
||||
});
|
||||
truncateStackTrace(failure);
|
||||
throw failure;
|
||||
} else {
|
||||
const failure = ApplicationFailure.nonRetryable(message, classified.type, [
|
||||
{ phase: 'preflight', attemptNumber, elapsed: Date.now() - startTime },
|
||||
]);
|
||||
truncateStackTrace(failure);
|
||||
throw failure;
|
||||
}
|
||||
}
|
||||
|
||||
logger.info('Preflight validation passed');
|
||||
} catch (error) {
|
||||
if (error instanceof ApplicationFailure) {
|
||||
throw error;
|
||||
}
|
||||
|
||||
const classified = classifyErrorForTemporal(error);
|
||||
const rawMessage = error instanceof Error ? error.message : String(error);
|
||||
const message = truncateErrorMessage(rawMessage);
|
||||
|
||||
const failure = ApplicationFailure.nonRetryable(message, classified.type, [
|
||||
{ phase: 'preflight', attemptNumber, elapsed: Date.now() - startTime },
|
||||
]);
|
||||
truncateStackTrace(failure);
|
||||
throw failure;
|
||||
} finally {
|
||||
clearInterval(heartbeatInterval);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Assemble the final report by concatenating exploitation evidence files.
|
||||
*/
|
||||
export async function assembleReportActivity(input: ActivityInput): Promise<void> {
|
||||
const { repoPath } = input;
|
||||
const logger = createActivityLogger();
|
||||
logger.info('Assembling deliverables from specialist agents...');
|
||||
try {
|
||||
await assembleFinalReport(repoPath, logger);
|
||||
} catch (error) {
|
||||
const err = error as Error;
|
||||
logger.warn(`Error assembling final report: ${err.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Inject model metadata into the final report.
|
||||
*/
|
||||
export async function injectReportMetadataActivity(input: ActivityInput): Promise<void> {
|
||||
const { repoPath, sessionId, outputPath } = input;
|
||||
const logger = createActivityLogger();
|
||||
const effectiveOutputPath = outputPath ? path.join(outputPath, sessionId) : path.join('./workspaces', sessionId);
|
||||
try {
|
||||
await injectModelIntoReport(repoPath, effectiveOutputPath, logger);
|
||||
} catch (error) {
|
||||
const err = error as Error;
|
||||
logger.warn(`Error injecting model into report: ${err.message}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if exploitation should run for a given vulnerability type.
|
||||
*
|
||||
* Uses existing container if available (from prior agent runs),
|
||||
* otherwise creates service directly (stateless, no dependencies).
|
||||
*/
|
||||
export async function checkExploitationQueue(input: ActivityInput, vulnType: VulnType): Promise<ExploitationDecision> {
|
||||
const { repoPath, workflowId } = input;
|
||||
const logger = createActivityLogger();
|
||||
|
||||
// Reuse container's service if available (from prior vuln agent runs)
|
||||
const existingContainer = getContainer(workflowId);
|
||||
const checker = existingContainer?.exploitationChecker ?? new ExploitationCheckerService();
|
||||
|
||||
return checker.checkQueue(vulnType, repoPath, logger);
|
||||
}
|
||||
|
||||
interface SessionJson {
|
||||
session: {
|
||||
id: string;
|
||||
webUrl: string;
|
||||
repoPath?: string;
|
||||
originalWorkflowId?: string;
|
||||
resumeAttempts?: ResumeAttempt[];
|
||||
};
|
||||
metrics: {
|
||||
agents: Record<
|
||||
string,
|
||||
{
|
||||
status: 'in-progress' | 'success' | 'failed';
|
||||
checkpoint?: string;
|
||||
}
|
||||
>;
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Load resume state from an existing workspace.
|
||||
*/
|
||||
export async function loadResumeState(
|
||||
workspaceName: string,
|
||||
expectedUrl: string,
|
||||
expectedRepoPath: string,
|
||||
): Promise<ResumeState> {
|
||||
// 1. Validate workspace exists
|
||||
const sessionPath = path.join('./workspaces', workspaceName, 'session.json');
|
||||
|
||||
const exists = await fileExists(sessionPath);
|
||||
if (!exists) {
|
||||
throw ApplicationFailure.nonRetryable(
|
||||
`Workspace not found: ${workspaceName}\nExpected path: ${sessionPath}`,
|
||||
'WorkspaceNotFoundError',
|
||||
);
|
||||
}
|
||||
|
||||
// 2. Parse session.json and validate URL match
|
||||
let session: SessionJson;
|
||||
try {
|
||||
session = await readJson<SessionJson>(sessionPath);
|
||||
} catch (error) {
|
||||
const errorMsg = error instanceof Error ? error.message : String(error);
|
||||
throw ApplicationFailure.nonRetryable(
|
||||
`Corrupted session.json in workspace ${workspaceName}: ${errorMsg}`,
|
||||
'CorruptedSessionError',
|
||||
);
|
||||
}
|
||||
|
||||
if (session.session.webUrl !== expectedUrl) {
|
||||
throw ApplicationFailure.nonRetryable(
|
||||
`URL mismatch with workspace\n Workspace URL: ${session.session.webUrl}\n Provided URL: ${expectedUrl}`,
|
||||
'URLMismatchError',
|
||||
);
|
||||
}
|
||||
|
||||
// 3. Cross-check agent status with deliverables on disk
|
||||
const completedAgents: string[] = [];
|
||||
const agents = session.metrics.agents;
|
||||
|
||||
for (const agentName of ALL_AGENTS) {
|
||||
const agentData = agents[agentName];
|
||||
if (!agentData || agentData.status !== 'success') {
|
||||
continue;
|
||||
}
|
||||
|
||||
const deliverableFilename = AGENTS[agentName].deliverableFilename;
|
||||
const deliverablePath = `${expectedRepoPath}/deliverables/${deliverableFilename}`;
|
||||
const deliverableExists = await fileExists(deliverablePath);
|
||||
|
||||
if (!deliverableExists) {
|
||||
const logger = createActivityLogger();
|
||||
logger.warn(`Agent ${agentName} shows success but deliverable missing, will re-run`);
|
||||
continue;
|
||||
}
|
||||
|
||||
completedAgents.push(agentName);
|
||||
}
|
||||
|
||||
// 4. Collect git checkpoints and validate at least one exists
|
||||
const checkpoints = completedAgents
|
||||
.map((name) => agents[name]?.checkpoint)
|
||||
.filter((hash): hash is string => hash != null);
|
||||
|
||||
if (checkpoints.length === 0) {
|
||||
const successAgents = Object.entries(agents)
|
||||
.filter(([, data]) => data.status === 'success')
|
||||
.map(([name]) => name);
|
||||
|
||||
throw ApplicationFailure.nonRetryable(
|
||||
`Cannot resume workspace ${workspaceName}: ` +
|
||||
(successAgents.length > 0
|
||||
? `${successAgents.length} agent(s) show success in session.json (${successAgents.join(', ')}) ` +
|
||||
`but their deliverable files are missing from disk. ` +
|
||||
`Start a fresh run instead.`
|
||||
: `No agents completed successfully. Start a fresh run instead.`),
|
||||
'NoCheckpointsError',
|
||||
);
|
||||
}
|
||||
|
||||
// 5. Find the most recent checkpoint commit
|
||||
const checkpointHash = await findLatestCommit(expectedRepoPath, checkpoints);
|
||||
const originalWorkflowId = session.session.originalWorkflowId || session.session.id;
|
||||
|
||||
// 6. Log summary and return resume state
|
||||
const logger = createActivityLogger();
|
||||
logger.info('Resume state loaded', {
|
||||
workspace: workspaceName,
|
||||
completedAgents: completedAgents.length,
|
||||
checkpoint: checkpointHash,
|
||||
});
|
||||
|
||||
return {
|
||||
workspaceName,
|
||||
originalUrl: session.session.webUrl,
|
||||
completedAgents,
|
||||
checkpointHash,
|
||||
originalWorkflowId,
|
||||
};
|
||||
}
|
||||
|
||||
async function findLatestCommit(repoPath: string, commitHashes: string[]): Promise<string> {
|
||||
if (commitHashes.length === 1) {
|
||||
const hash = commitHashes[0];
|
||||
if (!hash) {
|
||||
throw new PentestError(
|
||||
'Empty commit hash in array',
|
||||
'filesystem',
|
||||
false, // Non-retryable - corrupt workspace state
|
||||
{ phase: 'resume' },
|
||||
ErrorCode.GIT_CHECKPOINT_FAILED,
|
||||
);
|
||||
}
|
||||
return hash;
|
||||
}
|
||||
|
||||
const result = await executeGitCommandWithRetry(
|
||||
['git', 'rev-list', '--max-count=1', ...commitHashes],
|
||||
repoPath,
|
||||
'find latest commit',
|
||||
);
|
||||
|
||||
return result.stdout.trim();
|
||||
}
|
||||
|
||||
/**
|
||||
* Restore git workspace to a checkpoint and clean up partial deliverables.
|
||||
*/
|
||||
export async function restoreGitCheckpoint(
|
||||
repoPath: string,
|
||||
checkpointHash: string,
|
||||
incompleteAgents: AgentName[],
|
||||
): Promise<void> {
|
||||
const logger = createActivityLogger();
|
||||
logger.info(`Restoring git workspace to ${checkpointHash}...`);
|
||||
|
||||
await executeGitCommandWithRetry(
|
||||
['git', 'reset', '--hard', checkpointHash],
|
||||
repoPath,
|
||||
'reset to checkpoint for resume',
|
||||
);
|
||||
await executeGitCommandWithRetry(['git', 'clean', '-fd'], repoPath, 'clean untracked files for resume');
|
||||
|
||||
for (const agentName of incompleteAgents) {
|
||||
const deliverableFilename = AGENTS[agentName].deliverableFilename;
|
||||
const deliverablePath = `${repoPath}/deliverables/${deliverableFilename}`;
|
||||
try {
|
||||
const exists = await fileExists(deliverablePath);
|
||||
if (exists) {
|
||||
logger.warn(`Cleaning partial deliverable: ${agentName}`);
|
||||
await fs.unlink(deliverablePath);
|
||||
}
|
||||
} catch (error) {
|
||||
logger.info(`Note: Failed to delete ${deliverablePath}: ${error}`);
|
||||
}
|
||||
}
|
||||
|
||||
logger.info('Workspace restored to clean state');
|
||||
}
|
||||
|
||||
/**
|
||||
* Record a resume attempt in session.json and write resume header to workflow.log.
|
||||
*/
|
||||
export async function recordResumeAttempt(
|
||||
input: ActivityInput,
|
||||
terminatedWorkflows: string[],
|
||||
checkpointHash: string,
|
||||
previousWorkflowId: string,
|
||||
completedAgents: string[],
|
||||
): Promise<void> {
|
||||
const sessionMetadata = buildSessionMetadata(input);
|
||||
const auditSession = new AuditSession(sessionMetadata);
|
||||
await auditSession.initialize();
|
||||
|
||||
// Update session.json with resume attempt
|
||||
await auditSession.addResumeAttempt(input.workflowId, terminatedWorkflows, checkpointHash);
|
||||
|
||||
// Write resume header to workflow.log
|
||||
await auditSession.logResumeHeader({
|
||||
previousWorkflowId,
|
||||
newWorkflowId: input.workflowId,
|
||||
checkpointHash,
|
||||
completedAgents,
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Log phase transition to the unified workflow log.
|
||||
*/
|
||||
export async function logPhaseTransition(
|
||||
input: ActivityInput,
|
||||
phase: string,
|
||||
event: 'start' | 'complete',
|
||||
): Promise<void> {
|
||||
const sessionMetadata = buildSessionMetadata(input);
|
||||
const auditSession = new AuditSession(sessionMetadata);
|
||||
await auditSession.initialize(input.workflowId);
|
||||
|
||||
if (event === 'start') {
|
||||
await auditSession.logPhaseStart(phase);
|
||||
} else {
|
||||
await auditSession.logPhaseComplete(phase);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Log workflow completion with full summary.
|
||||
* Cleans up container when done.
|
||||
*/
|
||||
export async function logWorkflowComplete(input: ActivityInput, summary: WorkflowSummary): Promise<void> {
|
||||
const { repoPath, workflowId } = input;
|
||||
const sessionMetadata = buildSessionMetadata(input);
|
||||
|
||||
// 1. Initialize audit session and mark final status
|
||||
const auditSession = new AuditSession(sessionMetadata);
|
||||
await auditSession.initialize(workflowId);
|
||||
await auditSession.updateSessionStatus(summary.status);
|
||||
|
||||
// 2. Load cumulative metrics from session.json
|
||||
const sessionData = (await auditSession.getMetrics()) as {
|
||||
metrics: {
|
||||
total_duration_ms: number;
|
||||
total_cost_usd: number;
|
||||
agents: Record<string, { final_duration_ms: number; total_cost_usd: number }>;
|
||||
};
|
||||
};
|
||||
|
||||
// 3. Fill in metrics for skipped agents (resumed from previous run)
|
||||
const agentMetrics = { ...summary.agentMetrics };
|
||||
for (const agentName of summary.completedAgents) {
|
||||
if (!agentMetrics[agentName]) {
|
||||
const agentData = sessionData.metrics.agents[agentName];
|
||||
if (agentData) {
|
||||
agentMetrics[agentName] = {
|
||||
durationMs: agentData.final_duration_ms,
|
||||
costUsd: agentData.total_cost_usd,
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 4. Build cumulative summary with cross-run totals
|
||||
const cumulativeSummary: WorkflowSummary = {
|
||||
...summary,
|
||||
totalDurationMs: sessionData.metrics.total_duration_ms,
|
||||
totalCostUsd: sessionData.metrics.total_cost_usd,
|
||||
agentMetrics,
|
||||
};
|
||||
|
||||
// 5. Write completion entry to workflow.log
|
||||
await auditSession.logWorkflowComplete(cumulativeSummary);
|
||||
|
||||
// 6. Copy deliverables to workspaces
|
||||
try {
|
||||
await copyDeliverablesToAudit(sessionMetadata, repoPath);
|
||||
} catch (copyErr) {
|
||||
const logger = createActivityLogger();
|
||||
logger.error('Failed to copy deliverables to workspaces', {
|
||||
error: copyErr instanceof Error ? copyErr.message : String(copyErr),
|
||||
});
|
||||
}
|
||||
|
||||
// 7. Clean up container
|
||||
removeContainer(workflowId);
|
||||
}
|
||||
@@ -0,0 +1,34 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
import { Context } from '@temporalio/activity';
|
||||
import type { ActivityLogger } from '../types/activity-logger.js';
|
||||
|
||||
/**
|
||||
* ActivityLogger backed by Temporal's Context.current().log.
|
||||
* Must be called inside a running Temporal activity — throws otherwise.
|
||||
*/
|
||||
export class TemporalActivityLogger implements ActivityLogger {
|
||||
info(message: string, attrs?: Record<string, unknown>): void {
|
||||
Context.current().log.info(message, attrs ?? {});
|
||||
}
|
||||
|
||||
warn(message: string, attrs?: Record<string, unknown>): void {
|
||||
Context.current().log.warn(message, attrs ?? {});
|
||||
}
|
||||
|
||||
error(message: string, attrs?: Record<string, unknown>): void {
|
||||
Context.current().log.error(message, attrs ?? {});
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Create an ActivityLogger. Must be called inside a Temporal activity.
|
||||
* Throws if called outside an activity context.
|
||||
*/
|
||||
export function createActivityLogger(): ActivityLogger {
|
||||
return new TemporalActivityLogger();
|
||||
}
|
||||
@@ -0,0 +1,66 @@
|
||||
import { defineQuery } from '@temporalio/workflow';
|
||||
|
||||
export type { AgentMetrics } from '../types/metrics.js';
|
||||
|
||||
import type { PipelineConfig } from '../types/config.js';
|
||||
import type { AgentMetrics } from '../types/metrics.js';
|
||||
|
||||
export interface PipelineInput {
|
||||
webUrl: string;
|
||||
repoPath: string;
|
||||
configPath?: string;
|
||||
outputPath?: string;
|
||||
pipelineTestingMode?: boolean;
|
||||
pipelineConfig?: PipelineConfig;
|
||||
workflowId?: string; // Used for audit correlation
|
||||
sessionId?: string; // Workspace directory name (distinct from workflowId for named workspaces)
|
||||
resumeFromWorkspace?: string; // Workspace name to resume from
|
||||
terminatedWorkflows?: string[]; // Workflows terminated during resume
|
||||
}
|
||||
|
||||
export interface ResumeState {
|
||||
workspaceName: string;
|
||||
originalUrl: string;
|
||||
completedAgents: string[];
|
||||
checkpointHash: string;
|
||||
originalWorkflowId: string;
|
||||
}
|
||||
|
||||
export interface PipelineSummary {
|
||||
totalCostUsd: number;
|
||||
totalDurationMs: number; // Wall-clock time (end - start)
|
||||
totalTurns: number;
|
||||
agentCount: number;
|
||||
}
|
||||
|
||||
export interface PipelineState {
|
||||
status: 'running' | 'completed' | 'failed';
|
||||
currentPhase: string | null;
|
||||
currentAgent: string | null;
|
||||
completedAgents: string[];
|
||||
failedAgent: string | null;
|
||||
error: string | null;
|
||||
startTime: number;
|
||||
agentMetrics: Record<string, AgentMetrics>;
|
||||
summary: PipelineSummary | null;
|
||||
}
|
||||
|
||||
// Extended state returned by getProgress query (includes computed fields)
|
||||
export interface PipelineProgress extends PipelineState {
|
||||
workflowId: string;
|
||||
elapsedMs: number;
|
||||
}
|
||||
|
||||
// Result from a single vuln→exploit pipeline
|
||||
export interface VulnExploitPipelineResult {
|
||||
vulnType: string;
|
||||
vulnMetrics: AgentMetrics | null;
|
||||
exploitMetrics: AgentMetrics | null;
|
||||
exploitDecision: {
|
||||
shouldExploit: boolean;
|
||||
vulnerabilityCount: number;
|
||||
} | null;
|
||||
error: string | null;
|
||||
}
|
||||
|
||||
export const getProgress = defineQuery<PipelineProgress>('getProgress');
|
||||
@@ -0,0 +1,39 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Maps PipelineState to WorkflowSummary for audit logging.
|
||||
* Pure function with no side effects.
|
||||
*/
|
||||
|
||||
import type { WorkflowSummary } from '../audit/workflow-logger.js';
|
||||
import type { PipelineState } from './shared.js';
|
||||
|
||||
/**
|
||||
* Maps PipelineState to WorkflowSummary.
|
||||
*
|
||||
* This function is deterministic (no Date.now() or I/O) so it can be
|
||||
* safely imported into Temporal workflows. The caller must ensure
|
||||
* state.summary is set before calling (via computeSummary).
|
||||
*/
|
||||
export function toWorkflowSummary(state: PipelineState, status: 'completed' | 'failed'): WorkflowSummary {
|
||||
// state.summary must be computed before calling this mapper
|
||||
const summary = state.summary;
|
||||
if (!summary) {
|
||||
throw new Error('toWorkflowSummary: state.summary must be set before calling');
|
||||
}
|
||||
|
||||
return {
|
||||
status,
|
||||
totalDurationMs: summary.totalDurationMs,
|
||||
totalCostUsd: summary.totalCostUsd,
|
||||
completedAgents: state.completedAgents,
|
||||
agentMetrics: Object.fromEntries(
|
||||
Object.entries(state.agentMetrics).map(([name, m]) => [name, { durationMs: m.durationMs, costUsd: m.costUsd }]),
|
||||
),
|
||||
...(state.error && { error: state.error }),
|
||||
};
|
||||
}
|
||||
@@ -0,0 +1,454 @@
|
||||
#!/usr/bin/env node
|
||||
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Combined Temporal worker + client for Shannon pentest pipeline.
|
||||
*
|
||||
* Starts a worker on a per-invocation task queue, submits a workflow,
|
||||
* waits for the result, and exits. Designed to run as a single ephemeral
|
||||
* container per scan.
|
||||
*
|
||||
* Usage:
|
||||
* node dist/temporal/worker.js <webUrl> <repoPath> [options]
|
||||
*
|
||||
* Options:
|
||||
* --task-queue <name> Task queue name (required, unique per scan)
|
||||
* --config <path> Configuration file path
|
||||
* --output <path> Output directory for workspaces
|
||||
* --workspace <name> Resume from existing workspace
|
||||
* --pipeline-testing Use minimal prompts for fast testing
|
||||
*
|
||||
* Environment:
|
||||
* TEMPORAL_ADDRESS - Temporal server address (default: localhost:7233)
|
||||
*/
|
||||
|
||||
import fs from 'node:fs';
|
||||
import path from 'node:path';
|
||||
import { fileURLToPath } from 'node:url';
|
||||
import { Client, Connection, type WorkflowHandle, WorkflowNotFoundError } from '@temporalio/client';
|
||||
import { bundleWorkflowCode, NativeConnection, Worker } from '@temporalio/worker';
|
||||
import dotenv from 'dotenv';
|
||||
import { sanitizeHostname } from '../audit/utils.js';
|
||||
import { parseConfig } from '../config-parser.js';
|
||||
import type { PipelineConfig } from '../types/config.js';
|
||||
import { fileExists, readJson } from '../utils/file-io.js';
|
||||
import * as activities from './activities.js';
|
||||
import type { PipelineInput, PipelineProgress, PipelineState } from './shared.js';
|
||||
|
||||
dotenv.config();
|
||||
|
||||
const __dirname = path.dirname(fileURLToPath(import.meta.url));
|
||||
|
||||
const PROGRESS_QUERY = 'getProgress';
|
||||
|
||||
// === CLI Argument Parsing ===
|
||||
|
||||
interface CliArgs {
|
||||
webUrl: string;
|
||||
repoPath: string;
|
||||
taskQueue: string;
|
||||
configPath?: string;
|
||||
outputPath?: string;
|
||||
pipelineTestingMode: boolean;
|
||||
resumeFromWorkspace?: string;
|
||||
}
|
||||
|
||||
function showUsage(): void {
|
||||
console.log('\nShannon Worker');
|
||||
console.log('Combined worker + client for pentest pipeline\n');
|
||||
console.log('Usage:');
|
||||
console.log(' node dist/temporal/worker.js <webUrl> <repoPath> --task-queue <name> [options]\n');
|
||||
console.log('Options:');
|
||||
console.log(' --task-queue <name> Task queue name (required)');
|
||||
console.log(' --config <path> Configuration file path');
|
||||
console.log(' --workspace <name> Resume from existing workspace');
|
||||
console.log(' --pipeline-testing Use minimal prompts for fast testing\n');
|
||||
}
|
||||
|
||||
function parseCliArgs(argv: string[]): CliArgs {
|
||||
if (argv.includes('--help') || argv.includes('-h') || argv.length === 0) {
|
||||
showUsage();
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
let webUrl: string | undefined;
|
||||
let repoPath: string | undefined;
|
||||
let taskQueue: string | undefined;
|
||||
let configPath: string | undefined;
|
||||
let outputPath: string | undefined;
|
||||
let pipelineTestingMode = false;
|
||||
let resumeFromWorkspace: string | undefined;
|
||||
|
||||
for (let i = 0; i < argv.length; i++) {
|
||||
const arg = argv[i];
|
||||
if (arg === '--task-queue') {
|
||||
const nextArg = argv[i + 1];
|
||||
if (nextArg && !nextArg.startsWith('-')) {
|
||||
taskQueue = nextArg;
|
||||
i++;
|
||||
}
|
||||
} else if (arg === '--config') {
|
||||
const nextArg = argv[i + 1];
|
||||
if (nextArg && !nextArg.startsWith('-')) {
|
||||
configPath = nextArg;
|
||||
i++;
|
||||
}
|
||||
} else if (arg === '--output') {
|
||||
const nextArg = argv[i + 1];
|
||||
if (nextArg && !nextArg.startsWith('-')) {
|
||||
outputPath = nextArg;
|
||||
i++;
|
||||
}
|
||||
} else if (arg === '--workspace') {
|
||||
const nextArg = argv[i + 1];
|
||||
if (nextArg && !nextArg.startsWith('-')) {
|
||||
resumeFromWorkspace = nextArg;
|
||||
i++;
|
||||
}
|
||||
} else if (arg === '--pipeline-testing') {
|
||||
pipelineTestingMode = true;
|
||||
} else if (arg && !arg.startsWith('-')) {
|
||||
if (!webUrl) {
|
||||
webUrl = arg;
|
||||
} else if (!repoPath) {
|
||||
repoPath = arg;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (!webUrl || !repoPath) {
|
||||
console.error('Error: webUrl and repoPath are required');
|
||||
showUsage();
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (!taskQueue) {
|
||||
console.error('Error: --task-queue is required');
|
||||
showUsage();
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
return {
|
||||
webUrl,
|
||||
repoPath,
|
||||
taskQueue,
|
||||
pipelineTestingMode,
|
||||
...(configPath && { configPath }),
|
||||
...(outputPath && { outputPath }),
|
||||
...(resumeFromWorkspace && { resumeFromWorkspace }),
|
||||
};
|
||||
}
|
||||
|
||||
// === Workspace Resolution ===
|
||||
|
||||
interface SessionJson {
|
||||
session: {
|
||||
id: string;
|
||||
webUrl: string;
|
||||
originalWorkflowId?: string;
|
||||
resumeAttempts?: Array<{ workflowId: string }>;
|
||||
};
|
||||
metrics: {
|
||||
total_cost_usd: number;
|
||||
};
|
||||
}
|
||||
|
||||
function isValidWorkspaceName(name: string): boolean {
|
||||
return /^[a-zA-Z0-9][a-zA-Z0-9_-]{0,127}$/.test(name);
|
||||
}
|
||||
|
||||
interface WorkspaceResolution {
|
||||
workflowId: string;
|
||||
sessionId: string;
|
||||
isResume: boolean;
|
||||
terminatedWorkflows: string[];
|
||||
}
|
||||
|
||||
async function terminateExistingWorkflows(client: Client, workspaceName: string): Promise<string[]> {
|
||||
const sessionPath = path.join('./workspaces', workspaceName, 'session.json');
|
||||
|
||||
if (!(await fileExists(sessionPath))) {
|
||||
throw new Error(`Workspace not found: ${workspaceName}\n` + `Expected path: ${sessionPath}`);
|
||||
}
|
||||
|
||||
const session = await readJson<SessionJson>(sessionPath);
|
||||
|
||||
const workflowIds = [
|
||||
session.session.originalWorkflowId || session.session.id,
|
||||
...(session.session.resumeAttempts?.map((r) => r.workflowId) || []),
|
||||
].filter((id): id is string => id != null);
|
||||
|
||||
const terminated: string[] = [];
|
||||
|
||||
for (const wfId of workflowIds) {
|
||||
try {
|
||||
const handle = client.workflow.getHandle(wfId);
|
||||
const description = await handle.describe();
|
||||
|
||||
if (description.status.name === 'RUNNING') {
|
||||
console.log(`Terminating running workflow: ${wfId}`);
|
||||
await handle.terminate('Superseded by resume workflow');
|
||||
terminated.push(wfId);
|
||||
console.log(`Terminated: ${wfId}`);
|
||||
} else {
|
||||
console.log(`Workflow already ${description.status.name}: ${wfId}`);
|
||||
}
|
||||
} catch (error) {
|
||||
if (error instanceof WorkflowNotFoundError) {
|
||||
console.log(`Workflow not found (already cleaned up): ${wfId}`);
|
||||
} else {
|
||||
console.log(`Failed to terminate ${wfId}: ${error}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return terminated;
|
||||
}
|
||||
|
||||
async function resolveWorkspace(client: Client, args: CliArgs): Promise<WorkspaceResolution> {
|
||||
if (!args.resumeFromWorkspace) {
|
||||
const hostname = sanitizeHostname(args.webUrl);
|
||||
const workflowId = `${hostname}_shannon-${Date.now()}`;
|
||||
return {
|
||||
workflowId,
|
||||
sessionId: workflowId,
|
||||
isResume: false,
|
||||
terminatedWorkflows: [],
|
||||
};
|
||||
}
|
||||
|
||||
const workspace = args.resumeFromWorkspace;
|
||||
const sessionPath = path.join('./workspaces', workspace, 'session.json');
|
||||
const workspaceExists = await fileExists(sessionPath);
|
||||
|
||||
if (workspaceExists) {
|
||||
console.log('=== RESUME MODE ===');
|
||||
console.log(`Workspace: ${workspace}\n`);
|
||||
|
||||
const terminatedWorkflows = await terminateExistingWorkflows(client, workspace);
|
||||
if (terminatedWorkflows.length > 0) {
|
||||
console.log(`Terminated ${terminatedWorkflows.length} previous workflow(s)\n`);
|
||||
}
|
||||
|
||||
const session = await readJson<SessionJson>(sessionPath);
|
||||
if (session.session.webUrl !== args.webUrl) {
|
||||
console.error('ERROR: URL mismatch with workspace');
|
||||
console.error(` Workspace URL: ${session.session.webUrl}`);
|
||||
console.error(` Provided URL: ${args.webUrl}`);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
return {
|
||||
workflowId: `${workspace}_resume_${Date.now()}`,
|
||||
sessionId: workspace,
|
||||
isResume: true,
|
||||
terminatedWorkflows,
|
||||
};
|
||||
}
|
||||
|
||||
if (!isValidWorkspaceName(workspace)) {
|
||||
console.error(`ERROR: Invalid workspace name: "${workspace}"`);
|
||||
console.error(' Must be 1-128 characters, alphanumeric/hyphens/underscores, starting with alphanumeric');
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
console.log('=== NEW NAMED WORKSPACE ===');
|
||||
console.log(`Workspace: ${workspace}\n`);
|
||||
|
||||
// If the workspace name already looks like a CLI-generated ID
|
||||
// (ends with _shannon-<digits>), use it directly to avoid double _shannon- suffixes
|
||||
const workflowId = /_shannon-\d+$/.test(workspace) ? workspace : `${workspace}_shannon-${Date.now()}`;
|
||||
|
||||
return {
|
||||
workflowId,
|
||||
sessionId: workspace,
|
||||
isResume: false,
|
||||
terminatedWorkflows: [],
|
||||
};
|
||||
}
|
||||
|
||||
// === Pipeline Input Construction ===
|
||||
|
||||
async function loadPipelineConfig(configPath: string | undefined): Promise<PipelineConfig> {
|
||||
if (!configPath) return {};
|
||||
try {
|
||||
const config = await parseConfig(configPath);
|
||||
const raw = config.pipeline;
|
||||
if (!raw) return {};
|
||||
|
||||
const result: PipelineConfig = {};
|
||||
if (raw.retry_preset !== undefined) {
|
||||
result.retry_preset = raw.retry_preset;
|
||||
}
|
||||
if (raw.max_concurrent_pipelines !== undefined) {
|
||||
result.max_concurrent_pipelines = Number(raw.max_concurrent_pipelines);
|
||||
}
|
||||
return result;
|
||||
} catch {
|
||||
return {};
|
||||
}
|
||||
}
|
||||
|
||||
function buildPipelineInput(
|
||||
args: CliArgs,
|
||||
workspace: WorkspaceResolution,
|
||||
pipelineConfig: PipelineConfig,
|
||||
): PipelineInput {
|
||||
return {
|
||||
webUrl: args.webUrl,
|
||||
repoPath: args.repoPath,
|
||||
workflowId: workspace.workflowId,
|
||||
sessionId: workspace.sessionId,
|
||||
...(args.configPath && { configPath: args.configPath }),
|
||||
...(args.pipelineTestingMode && { pipelineTestingMode: args.pipelineTestingMode }),
|
||||
...(workspace.isResume && args.resumeFromWorkspace && { resumeFromWorkspace: args.resumeFromWorkspace }),
|
||||
...(workspace.terminatedWorkflows.length > 0 && { terminatedWorkflows: workspace.terminatedWorkflows }),
|
||||
...(Object.keys(pipelineConfig).length > 0 && { pipelineConfig }),
|
||||
};
|
||||
}
|
||||
|
||||
// === Workflow Result Handling ===
|
||||
|
||||
async function waitForWorkflowResult(
|
||||
handle: WorkflowHandle<(input: PipelineInput) => Promise<PipelineState>>,
|
||||
workspace: WorkspaceResolution,
|
||||
): Promise<void> {
|
||||
const progressInterval = setInterval(async () => {
|
||||
try {
|
||||
const progress = await handle.query<PipelineProgress>(PROGRESS_QUERY);
|
||||
const elapsed = Math.floor(progress.elapsedMs / 1000);
|
||||
console.log(
|
||||
`[${elapsed}s] Phase: ${progress.currentPhase || 'unknown'} | Agent: ${progress.currentAgent || 'none'} | Completed: ${progress.completedAgents.length}/13`,
|
||||
);
|
||||
} catch {
|
||||
// Workflow may have completed
|
||||
}
|
||||
}, 30000);
|
||||
|
||||
try {
|
||||
const result = await handle.result();
|
||||
clearInterval(progressInterval);
|
||||
|
||||
console.log('\nPipeline completed successfully!');
|
||||
if (result.summary) {
|
||||
console.log(`Duration: ${Math.floor(result.summary.totalDurationMs / 1000)}s`);
|
||||
console.log(`Agents completed: ${result.summary.agentCount}`);
|
||||
console.log(`Total turns: ${result.summary.totalTurns}`);
|
||||
console.log(`Run cost: $${result.summary.totalCostUsd.toFixed(4)}`);
|
||||
|
||||
if (workspace.isResume) {
|
||||
try {
|
||||
const session = await readJson<SessionJson>(path.join('./workspaces', workspace.sessionId, 'session.json'));
|
||||
console.log(`Cumulative cost: $${session.metrics.total_cost_usd.toFixed(4)}`);
|
||||
} catch {
|
||||
// Non-fatal
|
||||
}
|
||||
}
|
||||
}
|
||||
} catch (error) {
|
||||
clearInterval(progressInterval);
|
||||
console.error('\nPipeline failed:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
// === Deliverables Copy ===
|
||||
|
||||
function copyDeliverables(repoPath: string, outputPath: string): void {
|
||||
const deliverablesDir = path.join(repoPath, 'deliverables');
|
||||
if (!fs.existsSync(deliverablesDir)) {
|
||||
console.log('No deliverables directory found, skipping copy');
|
||||
return;
|
||||
}
|
||||
|
||||
const files = fs.readdirSync(deliverablesDir);
|
||||
if (files.length === 0) {
|
||||
console.log('No deliverables to copy');
|
||||
return;
|
||||
}
|
||||
|
||||
fs.mkdirSync(outputPath, { recursive: true });
|
||||
|
||||
for (const file of files) {
|
||||
const src = path.join(deliverablesDir, file);
|
||||
const dest = path.join(outputPath, file);
|
||||
fs.cpSync(src, dest, { recursive: true });
|
||||
}
|
||||
|
||||
console.log(`Copied ${files.length} deliverable(s) to ${outputPath}`);
|
||||
}
|
||||
|
||||
// === Main Entry Point ===
|
||||
|
||||
async function run(): Promise<void> {
|
||||
// 1. Parse CLI args
|
||||
const args = parseCliArgs(process.argv.slice(2));
|
||||
|
||||
// 2. Connect to Temporal server
|
||||
const address = process.env.TEMPORAL_ADDRESS || 'localhost:7233';
|
||||
console.log(`Connecting to Temporal at ${address}...`);
|
||||
|
||||
const connection = await NativeConnection.connect({ address });
|
||||
const clientConnection = await Connection.connect({ address });
|
||||
const client = new Client({ connection: clientConnection });
|
||||
|
||||
try {
|
||||
// 3. Bundle workflows and create worker on per-invocation task queue
|
||||
console.log('Bundling workflows...');
|
||||
const workflowBundle = await bundleWorkflowCode({
|
||||
workflowsPath: path.join(__dirname, 'workflows.js'),
|
||||
});
|
||||
|
||||
const worker = await Worker.create({
|
||||
connection,
|
||||
namespace: 'default',
|
||||
workflowBundle,
|
||||
activities,
|
||||
taskQueue: args.taskQueue,
|
||||
maxConcurrentActivityTaskExecutions: 25,
|
||||
});
|
||||
|
||||
// 4. Resolve workspace and build pipeline input
|
||||
const workspace = await resolveWorkspace(client, args);
|
||||
const pipelineConfig = await loadPipelineConfig(args.configPath);
|
||||
const input = buildPipelineInput(args, workspace, pipelineConfig);
|
||||
|
||||
// 5. Start worker polling in the background
|
||||
const workerDone = worker.run();
|
||||
|
||||
// 6. Submit workflow to the same task queue
|
||||
const handle = await client.workflow.start<(input: PipelineInput) => Promise<PipelineState>>(
|
||||
'pentestPipelineWorkflow',
|
||||
{
|
||||
taskQueue: args.taskQueue,
|
||||
workflowId: workspace.workflowId,
|
||||
args: [input],
|
||||
},
|
||||
);
|
||||
|
||||
// 7. Wait for workflow result
|
||||
await waitForWorkflowResult(handle, workspace);
|
||||
|
||||
// 8. Copy deliverables to output directory
|
||||
if (args.outputPath) {
|
||||
copyDeliverables(args.repoPath, args.outputPath);
|
||||
}
|
||||
|
||||
// 9. Shut down worker gracefully
|
||||
worker.shutdown();
|
||||
await workerDone;
|
||||
} finally {
|
||||
await connection.close();
|
||||
await clientConnection.close();
|
||||
}
|
||||
}
|
||||
|
||||
run().catch((err) => {
|
||||
console.error('Worker failed:', err);
|
||||
process.exit(1);
|
||||
});
|
||||
@@ -0,0 +1,88 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Workflow error formatting utilities.
|
||||
* Pure functions with no side effects — safe for Temporal workflow sandbox.
|
||||
*/
|
||||
|
||||
/** Maps Temporal error type strings to actionable remediation hints. */
|
||||
const REMEDIATION_HINTS: Record<string, string> = {
|
||||
AuthenticationError: 'Verify ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN in .env is valid and not expired.',
|
||||
ConfigurationError: 'Check your CONFIG file path and contents.',
|
||||
BillingError: 'Check your Anthropic billing dashboard. Add credits or wait for spending cap reset.',
|
||||
GitError: 'Check repository path and git state.',
|
||||
InvalidTargetError: 'Verify the target URL is correct and accessible.',
|
||||
PermissionError: 'Check file and network permissions.',
|
||||
ExecutionLimitError: 'Agent exceeded maximum turns or budget. Review prompt complexity.',
|
||||
};
|
||||
|
||||
/**
|
||||
* Walk the .cause chain to find the innermost error with a .type property.
|
||||
* Temporal wraps ApplicationFailure in ActivityFailure — the useful info is inside.
|
||||
*
|
||||
* Uses duck-typing because workflow code cannot import @temporalio/activity types.
|
||||
*/
|
||||
function unwrapActivityError(error: unknown): {
|
||||
message: string;
|
||||
type: string | null;
|
||||
} {
|
||||
let current: unknown = error;
|
||||
let typed: { message: string; type: string } | null = null;
|
||||
|
||||
while (current instanceof Error) {
|
||||
if ('type' in current && typeof (current as { type: unknown }).type === 'string') {
|
||||
typed = {
|
||||
message: current.message,
|
||||
type: (current as { type: string }).type,
|
||||
};
|
||||
}
|
||||
current = (current as { cause?: unknown }).cause;
|
||||
}
|
||||
|
||||
if (typed) {
|
||||
return typed;
|
||||
}
|
||||
|
||||
return {
|
||||
message: error instanceof Error ? error.message : String(error),
|
||||
type: null,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Format a structured error string from workflow catch context.
|
||||
* Segments are delimited by | for multi-line rendering by WorkflowLogger.
|
||||
*/
|
||||
export function formatWorkflowError(error: unknown, currentPhase: string | null, currentAgent: string | null): string {
|
||||
const unwrapped = unwrapActivityError(error);
|
||||
|
||||
// Phase context (first segment)
|
||||
let phaseContext = 'Pipeline failed';
|
||||
if (currentPhase && currentAgent && currentPhase !== currentAgent) {
|
||||
phaseContext = `${currentPhase} failed (agent: ${currentAgent})`;
|
||||
} else if (currentPhase) {
|
||||
phaseContext = `${currentPhase} failed`;
|
||||
}
|
||||
|
||||
const segments: string[] = [phaseContext];
|
||||
|
||||
if (unwrapped.type) {
|
||||
segments.push(unwrapped.type);
|
||||
}
|
||||
|
||||
// Sanitize pipe characters from message to preserve delimiter format
|
||||
segments.push(unwrapped.message.replaceAll('|', '/'));
|
||||
|
||||
if (unwrapped.type) {
|
||||
const hint = REMEDIATION_HINTS[unwrapped.type];
|
||||
if (hint) {
|
||||
segments.push(`Hint: ${hint}`);
|
||||
}
|
||||
}
|
||||
|
||||
return segments.join('|');
|
||||
}
|
||||
@@ -0,0 +1,484 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Temporal workflow for Shannon pentest pipeline.
|
||||
*
|
||||
* Orchestrates the penetration testing workflow:
|
||||
* 1. Pre-Reconnaissance (sequential)
|
||||
* 2. Reconnaissance (sequential)
|
||||
* 3-4. Vulnerability + Exploitation (5 pipelined pairs in parallel)
|
||||
* Each pair: vuln agent → queue check → conditional exploit
|
||||
* No synchronization barrier - exploits start when their vuln finishes
|
||||
* 5. Reporting (sequential)
|
||||
*
|
||||
* Features:
|
||||
* - Queryable state via getProgress
|
||||
* - Automatic retry with backoff for transient/billing errors
|
||||
* - Non-retryable classification for permanent errors
|
||||
* - Audit correlation via workflowId
|
||||
* - Graceful failure handling: pipelines continue if one fails
|
||||
*/
|
||||
|
||||
import { log, proxyActivities, setHandler, workflowInfo } from '@temporalio/workflow';
|
||||
import type { AgentName, VulnType } from '../types/agents.js';
|
||||
import { ALL_AGENTS } from '../types/agents.js';
|
||||
import type * as activities from './activities.js';
|
||||
import type { ActivityInput } from './activities.js';
|
||||
import {
|
||||
type AgentMetrics,
|
||||
getProgress,
|
||||
type PipelineInput,
|
||||
type PipelineProgress,
|
||||
type PipelineState,
|
||||
type PipelineSummary,
|
||||
type ResumeState,
|
||||
type VulnExploitPipelineResult,
|
||||
} from './shared.js';
|
||||
import { toWorkflowSummary } from './summary-mapper.js';
|
||||
import { formatWorkflowError } from './workflow-errors.js';
|
||||
|
||||
// Retry configuration for production (long intervals for billing recovery)
|
||||
const PRODUCTION_RETRY = {
|
||||
initialInterval: '5 minutes',
|
||||
maximumInterval: '30 minutes',
|
||||
backoffCoefficient: 2,
|
||||
maximumAttempts: 50,
|
||||
nonRetryableErrorTypes: [
|
||||
'AuthenticationError',
|
||||
'PermissionError',
|
||||
'InvalidRequestError',
|
||||
'RequestTooLargeError',
|
||||
'ConfigurationError',
|
||||
'InvalidTargetError',
|
||||
'ExecutionLimitError',
|
||||
],
|
||||
};
|
||||
|
||||
// Retry configuration for pipeline testing (fast iteration)
|
||||
const TESTING_RETRY = {
|
||||
initialInterval: '10 seconds',
|
||||
maximumInterval: '30 seconds',
|
||||
backoffCoefficient: 2,
|
||||
maximumAttempts: 5,
|
||||
nonRetryableErrorTypes: PRODUCTION_RETRY.nonRetryableErrorTypes,
|
||||
};
|
||||
|
||||
// Activity proxy with production retry configuration (default)
|
||||
const acts = proxyActivities<typeof activities>({
|
||||
startToCloseTimeout: '2 hours',
|
||||
heartbeatTimeout: '60 minutes', // Extended for sub-agent execution (SDK blocks event loop during Task tool calls)
|
||||
retry: PRODUCTION_RETRY,
|
||||
});
|
||||
|
||||
// Activity proxy with testing retry configuration (fast)
|
||||
const testActs = proxyActivities<typeof activities>({
|
||||
startToCloseTimeout: '30 minutes',
|
||||
heartbeatTimeout: '30 minutes', // Extended for sub-agent execution in testing
|
||||
retry: TESTING_RETRY,
|
||||
});
|
||||
|
||||
// Retry configuration for subscription plans (5h+ rolling rate limit windows)
|
||||
const SUBSCRIPTION_RETRY = {
|
||||
initialInterval: '5 minutes',
|
||||
maximumInterval: '6 hours',
|
||||
backoffCoefficient: 2,
|
||||
maximumAttempts: 100,
|
||||
nonRetryableErrorTypes: PRODUCTION_RETRY.nonRetryableErrorTypes,
|
||||
};
|
||||
|
||||
// Activity proxy for subscription plan recovery (extended timeouts)
|
||||
const subscriptionActs = proxyActivities<typeof activities>({
|
||||
startToCloseTimeout: '8 hours',
|
||||
heartbeatTimeout: '2 hours',
|
||||
retry: SUBSCRIPTION_RETRY,
|
||||
});
|
||||
|
||||
// Retry configuration for preflight validation (short timeout, few retries)
|
||||
const PREFLIGHT_RETRY = {
|
||||
initialInterval: '10 seconds',
|
||||
maximumInterval: '1 minute',
|
||||
backoffCoefficient: 2,
|
||||
maximumAttempts: 3,
|
||||
nonRetryableErrorTypes: PRODUCTION_RETRY.nonRetryableErrorTypes,
|
||||
};
|
||||
|
||||
// Activity proxy for preflight validation (short timeout)
|
||||
const preflightActs = proxyActivities<typeof activities>({
|
||||
startToCloseTimeout: '2 minutes',
|
||||
heartbeatTimeout: '2 minutes',
|
||||
retry: PREFLIGHT_RETRY,
|
||||
});
|
||||
|
||||
/**
|
||||
* Compute aggregated metrics from the current pipeline state.
|
||||
* Called on both success and failure to provide partial metrics.
|
||||
*/
|
||||
function computeSummary(state: PipelineState): PipelineSummary {
|
||||
const metrics = Object.values(state.agentMetrics);
|
||||
return {
|
||||
totalCostUsd: metrics.reduce((sum, m) => sum + (m.costUsd ?? 0), 0),
|
||||
totalDurationMs: Date.now() - state.startTime,
|
||||
totalTurns: metrics.reduce((sum, m) => sum + (m.numTurns ?? 0), 0),
|
||||
agentCount: state.completedAgents.length,
|
||||
};
|
||||
}
|
||||
|
||||
export async function pentestPipelineWorkflow(input: PipelineInput): Promise<PipelineState> {
|
||||
const { workflowId } = workflowInfo();
|
||||
|
||||
// Select activity proxy based on mode: testing (fast), subscription (extended), or default
|
||||
function selectActivityProxy(pipelineInput: PipelineInput) {
|
||||
if (pipelineInput.pipelineTestingMode) return testActs;
|
||||
if (pipelineInput.pipelineConfig?.retry_preset === 'subscription') return subscriptionActs;
|
||||
return acts;
|
||||
}
|
||||
|
||||
const a = selectActivityProxy(input);
|
||||
|
||||
const state: PipelineState = {
|
||||
status: 'running',
|
||||
currentPhase: null,
|
||||
currentAgent: null,
|
||||
completedAgents: [],
|
||||
failedAgent: null,
|
||||
error: null,
|
||||
startTime: Date.now(),
|
||||
agentMetrics: {},
|
||||
summary: null,
|
||||
};
|
||||
|
||||
setHandler(
|
||||
getProgress,
|
||||
(): PipelineProgress => ({
|
||||
...state,
|
||||
workflowId,
|
||||
elapsedMs: Date.now() - state.startTime,
|
||||
}),
|
||||
);
|
||||
|
||||
// Build ActivityInput with required workflowId for audit correlation
|
||||
// Activities require workflowId (non-optional), PipelineInput has it optional
|
||||
// Use spread to conditionally include optional properties (exactOptionalPropertyTypes)
|
||||
// sessionId is workspace name for resume, or workflowId for new runs
|
||||
const sessionId = input.sessionId || input.resumeFromWorkspace || workflowId;
|
||||
|
||||
const activityInput: ActivityInput = {
|
||||
webUrl: input.webUrl,
|
||||
repoPath: input.repoPath,
|
||||
workflowId,
|
||||
sessionId,
|
||||
...(input.configPath !== undefined && { configPath: input.configPath }),
|
||||
...(input.outputPath !== undefined && { outputPath: input.outputPath }),
|
||||
...(input.pipelineTestingMode !== undefined && {
|
||||
pipelineTestingMode: input.pipelineTestingMode,
|
||||
}),
|
||||
};
|
||||
|
||||
let resumeState: ResumeState | null = null;
|
||||
|
||||
if (input.resumeFromWorkspace) {
|
||||
// 1. Load resume state (validates workspace, cross-checks deliverables)
|
||||
resumeState = await a.loadResumeState(input.resumeFromWorkspace, input.webUrl, input.repoPath);
|
||||
|
||||
// 2. Restore git workspace and clean up incomplete deliverables
|
||||
const incompleteAgents = ALL_AGENTS.filter(
|
||||
(agentName) => !resumeState?.completedAgents.includes(agentName),
|
||||
) as AgentName[];
|
||||
|
||||
await a.restoreGitCheckpoint(input.repoPath, resumeState.checkpointHash, incompleteAgents);
|
||||
|
||||
// 3. Short-circuit if all agents already completed
|
||||
if (resumeState.completedAgents.length === ALL_AGENTS.length) {
|
||||
log.info(`All ${ALL_AGENTS.length} agents already completed. Nothing to resume.`);
|
||||
state.status = 'completed';
|
||||
state.completedAgents = [...resumeState.completedAgents];
|
||||
state.summary = computeSummary(state);
|
||||
return state;
|
||||
}
|
||||
|
||||
// 4. Record this resume attempt in session.json and workflow.log
|
||||
await a.recordResumeAttempt(
|
||||
activityInput,
|
||||
input.terminatedWorkflows || [],
|
||||
resumeState.checkpointHash,
|
||||
resumeState.originalWorkflowId,
|
||||
resumeState.completedAgents,
|
||||
);
|
||||
|
||||
log.info('Resume state loaded and workspace restored');
|
||||
}
|
||||
|
||||
const shouldSkip = (agentName: string): boolean => {
|
||||
return resumeState?.completedAgents.includes(agentName) ?? false;
|
||||
};
|
||||
|
||||
// Run a sequential agent phase (pre-recon, recon)
|
||||
async function runSequentialPhase(
|
||||
phaseName: string,
|
||||
agentName: AgentName,
|
||||
runAgent: (input: ActivityInput) => Promise<AgentMetrics>,
|
||||
): Promise<void> {
|
||||
if (!shouldSkip(agentName)) {
|
||||
state.currentPhase = phaseName;
|
||||
state.currentAgent = agentName;
|
||||
await a.logPhaseTransition(activityInput, phaseName, 'start');
|
||||
state.agentMetrics[agentName] = await runAgent(activityInput);
|
||||
state.completedAgents.push(agentName);
|
||||
await a.logPhaseTransition(activityInput, phaseName, 'complete');
|
||||
} else {
|
||||
log.info(`Skipping ${agentName} (already complete)`);
|
||||
state.completedAgents.push(agentName);
|
||||
}
|
||||
}
|
||||
|
||||
// Build pipeline configs for the 5 vuln→exploit pairs
|
||||
function buildPipelineConfigs(): Array<{
|
||||
vulnType: VulnType;
|
||||
vulnAgent: string;
|
||||
exploitAgent: string;
|
||||
runVuln: () => Promise<AgentMetrics>;
|
||||
runExploit: () => Promise<AgentMetrics>;
|
||||
}> {
|
||||
return [
|
||||
{
|
||||
vulnType: 'injection',
|
||||
vulnAgent: 'injection-vuln',
|
||||
exploitAgent: 'injection-exploit',
|
||||
runVuln: () => a.runInjectionVulnAgent(activityInput),
|
||||
runExploit: () => a.runInjectionExploitAgent(activityInput),
|
||||
},
|
||||
{
|
||||
vulnType: 'xss',
|
||||
vulnAgent: 'xss-vuln',
|
||||
exploitAgent: 'xss-exploit',
|
||||
runVuln: () => a.runXssVulnAgent(activityInput),
|
||||
runExploit: () => a.runXssExploitAgent(activityInput),
|
||||
},
|
||||
{
|
||||
vulnType: 'auth',
|
||||
vulnAgent: 'auth-vuln',
|
||||
exploitAgent: 'auth-exploit',
|
||||
runVuln: () => a.runAuthVulnAgent(activityInput),
|
||||
runExploit: () => a.runAuthExploitAgent(activityInput),
|
||||
},
|
||||
{
|
||||
vulnType: 'ssrf',
|
||||
vulnAgent: 'ssrf-vuln',
|
||||
exploitAgent: 'ssrf-exploit',
|
||||
runVuln: () => a.runSsrfVulnAgent(activityInput),
|
||||
runExploit: () => a.runSsrfExploitAgent(activityInput),
|
||||
},
|
||||
{
|
||||
vulnType: 'authz',
|
||||
vulnAgent: 'authz-vuln',
|
||||
exploitAgent: 'authz-exploit',
|
||||
runVuln: () => a.runAuthzVulnAgent(activityInput),
|
||||
runExploit: () => a.runAuthzExploitAgent(activityInput),
|
||||
},
|
||||
];
|
||||
}
|
||||
|
||||
// Aggregate results from settled pipeline promises into workflow state
|
||||
function aggregatePipelineResults(results: PromiseSettledResult<VulnExploitPipelineResult>[]): void {
|
||||
const failedPipelines: string[] = [];
|
||||
|
||||
for (const result of results) {
|
||||
if (result.status === 'fulfilled') {
|
||||
const { vulnType, vulnMetrics, exploitMetrics } = result.value;
|
||||
|
||||
const vulnAgentName = `${vulnType}-vuln`;
|
||||
if (vulnMetrics) {
|
||||
state.agentMetrics[vulnAgentName] = vulnMetrics;
|
||||
state.completedAgents.push(vulnAgentName);
|
||||
} else if (shouldSkip(vulnAgentName)) {
|
||||
state.completedAgents.push(vulnAgentName);
|
||||
}
|
||||
|
||||
const exploitAgentName = `${vulnType}-exploit`;
|
||||
if (exploitMetrics) {
|
||||
state.agentMetrics[exploitAgentName] = exploitMetrics;
|
||||
state.completedAgents.push(exploitAgentName);
|
||||
} else if (shouldSkip(exploitAgentName)) {
|
||||
state.completedAgents.push(exploitAgentName);
|
||||
}
|
||||
} else {
|
||||
const errorMsg = result.reason instanceof Error ? result.reason.message : String(result.reason);
|
||||
failedPipelines.push(errorMsg);
|
||||
}
|
||||
}
|
||||
|
||||
if (failedPipelines.length > 0) {
|
||||
log.warn(`${failedPipelines.length} pipeline(s) failed`, {
|
||||
failures: failedPipelines,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Run thunks with a concurrency limit, returning PromiseSettledResult for each.
|
||||
// When limit >= thunks.length (default), all launch concurrently — identical to Promise.allSettled.
|
||||
// NOTE: Results are in completion order, not input order. Callers must key on value fields, not index.
|
||||
async function runWithConcurrencyLimit(
|
||||
thunks: Array<() => Promise<VulnExploitPipelineResult>>,
|
||||
limit: number,
|
||||
): Promise<PromiseSettledResult<VulnExploitPipelineResult>[]> {
|
||||
const results: PromiseSettledResult<VulnExploitPipelineResult>[] = [];
|
||||
const inFlight = new Set<Promise<void>>();
|
||||
|
||||
for (const thunk of thunks) {
|
||||
const slot = thunk()
|
||||
.then(
|
||||
(value) => {
|
||||
results.push({ status: 'fulfilled', value });
|
||||
},
|
||||
(reason: unknown) => {
|
||||
results.push({ status: 'rejected', reason });
|
||||
},
|
||||
)
|
||||
.finally(() => {
|
||||
inFlight.delete(slot);
|
||||
});
|
||||
|
||||
inFlight.add(slot);
|
||||
|
||||
if (inFlight.size >= limit) {
|
||||
await Promise.race(inFlight);
|
||||
}
|
||||
}
|
||||
|
||||
await Promise.allSettled(inFlight);
|
||||
return results;
|
||||
}
|
||||
|
||||
try {
|
||||
// === Preflight Validation ===
|
||||
// Quick sanity checks before committing to expensive agent runs.
|
||||
// NOT using runSequentialPhase — preflight doesn't produce AgentMetrics.
|
||||
state.currentPhase = 'preflight';
|
||||
state.currentAgent = null;
|
||||
await preflightActs.runPreflightValidation(activityInput);
|
||||
log.info('Preflight validation passed');
|
||||
|
||||
// === Phase 1: Pre-Reconnaissance ===
|
||||
await runSequentialPhase('pre-recon', 'pre-recon', a.runPreReconAgent);
|
||||
|
||||
// === Phase 2: Reconnaissance ===
|
||||
await runSequentialPhase('recon', 'recon', a.runReconAgent);
|
||||
|
||||
// === Phases 3-4: Vulnerability Analysis + Exploitation (Pipelined) ===
|
||||
// Each vuln type runs as an independent pipeline:
|
||||
// vuln agent → queue check → conditional exploit agent
|
||||
// Exploits start immediately when their vuln finishes, not waiting for all.
|
||||
state.currentPhase = 'vulnerability-exploitation';
|
||||
state.currentAgent = 'pipelines';
|
||||
await a.logPhaseTransition(activityInput, 'vulnerability-exploitation', 'start');
|
||||
|
||||
// Closure over shouldSkip and activityInput by design (Temporal replay safety)
|
||||
async function runVulnExploitPipeline(
|
||||
vulnType: VulnType,
|
||||
runVulnAgent: () => Promise<AgentMetrics>,
|
||||
runExploitAgent: () => Promise<AgentMetrics>,
|
||||
): Promise<VulnExploitPipelineResult> {
|
||||
const vulnAgentName = `${vulnType}-vuln`;
|
||||
const exploitAgentName = `${vulnType}-exploit`;
|
||||
|
||||
// 1. Run vulnerability analysis (or skip if resumed)
|
||||
let vulnMetrics: AgentMetrics | null = null;
|
||||
if (!shouldSkip(vulnAgentName)) {
|
||||
vulnMetrics = await runVulnAgent();
|
||||
} else {
|
||||
log.info(`Skipping ${vulnAgentName} (already complete)`);
|
||||
}
|
||||
|
||||
// 2. Check exploitation queue for actionable findings
|
||||
const decision = await a.checkExploitationQueue(activityInput, vulnType);
|
||||
|
||||
// 3. Conditionally run exploitation agent
|
||||
let exploitMetrics: AgentMetrics | null = null;
|
||||
if (decision.shouldExploit) {
|
||||
if (!shouldSkip(exploitAgentName)) {
|
||||
exploitMetrics = await runExploitAgent();
|
||||
} else {
|
||||
log.info(`Skipping ${exploitAgentName} (already complete)`);
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
vulnType,
|
||||
vulnMetrics,
|
||||
exploitMetrics,
|
||||
exploitDecision: {
|
||||
shouldExploit: decision.shouldExploit,
|
||||
vulnerabilityCount: decision.vulnerabilityCount,
|
||||
},
|
||||
error: null,
|
||||
};
|
||||
}
|
||||
|
||||
const maxConcurrent = input.pipelineConfig?.max_concurrent_pipelines ?? 5;
|
||||
|
||||
const pipelineConfigs = buildPipelineConfigs();
|
||||
const pipelineThunks: Array<() => Promise<VulnExploitPipelineResult>> = [];
|
||||
|
||||
for (const config of pipelineConfigs) {
|
||||
if (!shouldSkip(config.vulnAgent) || !shouldSkip(config.exploitAgent)) {
|
||||
pipelineThunks.push(() => runVulnExploitPipeline(config.vulnType, config.runVuln, config.runExploit));
|
||||
} else {
|
||||
log.info(`Skipping entire ${config.vulnType} pipeline (both agents complete)`);
|
||||
state.completedAgents.push(config.vulnAgent, config.exploitAgent);
|
||||
}
|
||||
}
|
||||
|
||||
const pipelineResults = await runWithConcurrencyLimit(pipelineThunks, maxConcurrent);
|
||||
aggregatePipelineResults(pipelineResults);
|
||||
|
||||
state.currentPhase = 'exploitation';
|
||||
state.currentAgent = null;
|
||||
await a.logPhaseTransition(activityInput, 'vulnerability-exploitation', 'complete');
|
||||
|
||||
// === Phase 5: Reporting ===
|
||||
if (!shouldSkip('report')) {
|
||||
state.currentPhase = 'reporting';
|
||||
state.currentAgent = 'report';
|
||||
await a.logPhaseTransition(activityInput, 'reporting', 'start');
|
||||
|
||||
// First, assemble the concatenated report from exploitation evidence files
|
||||
await a.assembleReportActivity(activityInput);
|
||||
|
||||
// Then run the report agent to add executive summary and clean up
|
||||
state.agentMetrics.report = await a.runReportAgent(activityInput);
|
||||
state.completedAgents.push('report');
|
||||
|
||||
// Inject model metadata into the final report
|
||||
await a.injectReportMetadataActivity(activityInput);
|
||||
|
||||
await a.logPhaseTransition(activityInput, 'reporting', 'complete');
|
||||
} else {
|
||||
log.info('Skipping report (already complete)');
|
||||
state.completedAgents.push('report');
|
||||
}
|
||||
|
||||
state.status = 'completed';
|
||||
state.currentPhase = null;
|
||||
state.currentAgent = null;
|
||||
state.summary = computeSummary(state);
|
||||
|
||||
// Log workflow completion summary
|
||||
await a.logWorkflowComplete(activityInput, toWorkflowSummary(state, 'completed'));
|
||||
|
||||
return state;
|
||||
} catch (error) {
|
||||
state.status = 'failed';
|
||||
state.failedAgent = state.currentAgent;
|
||||
state.error = formatWorkflowError(error, state.currentPhase, state.currentAgent);
|
||||
state.summary = computeSummary(state);
|
||||
|
||||
// Log workflow failure summary
|
||||
await a.logWorkflowComplete(activityInput, toWorkflowSummary(state, 'failed'));
|
||||
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,174 @@
|
||||
#!/usr/bin/env node
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Workspace listing tool for Shannon.
|
||||
*
|
||||
* Reads workspaces/ directories, parses session.json files, and displays
|
||||
* a formatted table of all workspaces with status, duration, and cost.
|
||||
*
|
||||
* Usage:
|
||||
* node dist/temporal/workspaces.js
|
||||
*
|
||||
* Environment:
|
||||
* WORKSPACES_DIR - Override workspaces directory (default: ./workspaces)
|
||||
*/
|
||||
|
||||
import fs from 'node:fs/promises';
|
||||
import path from 'node:path';
|
||||
import { WORKSPACES_DIR as DEFAULT_WORKSPACES_DIR } from '../paths.js';
|
||||
|
||||
interface SessionJson {
|
||||
session: {
|
||||
id: string;
|
||||
webUrl: string;
|
||||
status: 'in-progress' | 'completed' | 'failed';
|
||||
createdAt: string;
|
||||
completedAt?: string;
|
||||
};
|
||||
metrics: {
|
||||
total_cost_usd: number;
|
||||
};
|
||||
}
|
||||
|
||||
interface WorkspaceInfo {
|
||||
name: string;
|
||||
url: string;
|
||||
status: 'in-progress' | 'completed' | 'failed';
|
||||
createdAt: Date;
|
||||
completedAt: Date | null;
|
||||
costUsd: number;
|
||||
}
|
||||
|
||||
function formatDuration(ms: number): string {
|
||||
const seconds = Math.floor(ms / 1000);
|
||||
const minutes = Math.floor(seconds / 60);
|
||||
const hours = Math.floor(minutes / 60);
|
||||
|
||||
if (hours > 0) {
|
||||
return `${hours}h ${minutes % 60}m`;
|
||||
}
|
||||
if (minutes > 0) {
|
||||
return `${minutes}m`;
|
||||
}
|
||||
return `${seconds}s`;
|
||||
}
|
||||
|
||||
function getStatusDisplay(status: string): string {
|
||||
return status;
|
||||
}
|
||||
|
||||
function truncate(str: string, maxLen: number): string {
|
||||
if (str.length <= maxLen) return str;
|
||||
return `${str.slice(0, maxLen - 1)}\u2026`;
|
||||
}
|
||||
|
||||
async function listWorkspaces(): Promise<void> {
|
||||
const workspacesDir = process.env.WORKSPACES_DIR || DEFAULT_WORKSPACES_DIR;
|
||||
|
||||
let entries: string[];
|
||||
try {
|
||||
entries = await fs.readdir(workspacesDir);
|
||||
} catch {
|
||||
console.log('No workspaces directory found.');
|
||||
console.log(`Expected: ${workspacesDir}`);
|
||||
return;
|
||||
}
|
||||
|
||||
const workspaces: WorkspaceInfo[] = [];
|
||||
|
||||
for (const entry of entries) {
|
||||
const sessionPath = path.join(workspacesDir, entry, 'session.json');
|
||||
try {
|
||||
const content = await fs.readFile(sessionPath, 'utf8');
|
||||
const data = JSON.parse(content) as SessionJson;
|
||||
|
||||
workspaces.push({
|
||||
name: entry,
|
||||
url: data.session.webUrl,
|
||||
status: data.session.status,
|
||||
createdAt: new Date(data.session.createdAt),
|
||||
completedAt: data.session.completedAt ? new Date(data.session.completedAt) : null,
|
||||
costUsd: data.metrics.total_cost_usd,
|
||||
});
|
||||
} catch {
|
||||
// Skip directories without valid session.json
|
||||
}
|
||||
}
|
||||
|
||||
if (workspaces.length === 0) {
|
||||
console.log('\nNo workspaces found.');
|
||||
console.log('Run a pipeline first: ./shannon start -u <url> -r <repo>');
|
||||
return;
|
||||
}
|
||||
|
||||
// Sort by creation date (most recent first)
|
||||
workspaces.sort((a, b) => b.createdAt.getTime() - a.createdAt.getTime());
|
||||
|
||||
console.log('\n=== Shannon Workspaces ===\n');
|
||||
|
||||
// Column widths
|
||||
const nameWidth = 30;
|
||||
const urlWidth = 30;
|
||||
const statusWidth = 14;
|
||||
const durationWidth = 10;
|
||||
const costWidth = 10;
|
||||
|
||||
// Header
|
||||
console.log(
|
||||
' ' +
|
||||
'WORKSPACE'.padEnd(nameWidth) +
|
||||
'URL'.padEnd(urlWidth) +
|
||||
'STATUS'.padEnd(statusWidth) +
|
||||
'DURATION'.padEnd(durationWidth) +
|
||||
'COST'.padEnd(costWidth),
|
||||
);
|
||||
console.log(` ${'\u2500'.repeat(nameWidth + urlWidth + statusWidth + durationWidth + costWidth)}`);
|
||||
|
||||
let resumableCount = 0;
|
||||
|
||||
for (const ws of workspaces) {
|
||||
const now = new Date();
|
||||
const endTime = ws.completedAt || now;
|
||||
const durationMs = endTime.getTime() - ws.createdAt.getTime();
|
||||
const duration = formatDuration(durationMs);
|
||||
const cost = `$${ws.costUsd.toFixed(2)}`;
|
||||
const isResumable = ws.status !== 'completed';
|
||||
|
||||
if (isResumable) {
|
||||
resumableCount++;
|
||||
}
|
||||
|
||||
const resumeTag = isResumable ? ' (resumable)' : '';
|
||||
|
||||
console.log(
|
||||
' ' +
|
||||
truncate(ws.name, nameWidth - 2).padEnd(nameWidth) +
|
||||
truncate(ws.url, urlWidth - 2).padEnd(urlWidth) +
|
||||
getStatusDisplay(ws.status).padEnd(statusWidth) +
|
||||
duration.padEnd(durationWidth) +
|
||||
cost.padEnd(costWidth) +
|
||||
resumeTag,
|
||||
);
|
||||
}
|
||||
|
||||
console.log();
|
||||
const summary = `${workspaces.length} workspace${workspaces.length === 1 ? '' : 's'} found`;
|
||||
const resumeSummary = resumableCount > 0 ? ` (${resumableCount} resumable)` : '';
|
||||
console.log(`${summary}${resumeSummary}`);
|
||||
|
||||
if (resumableCount > 0) {
|
||||
console.log('\nResume with: ./shannon start -u <url> -r <repo> -w <name>');
|
||||
}
|
||||
|
||||
console.log();
|
||||
}
|
||||
|
||||
listWorkspaces().catch((err) => {
|
||||
console.error('Error listing workspaces:', err);
|
||||
process.exit(1);
|
||||
});
|
||||
@@ -0,0 +1,15 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Logger interface for services called from Temporal activities.
|
||||
* Keeps services Temporal-agnostic while providing structured logging.
|
||||
*/
|
||||
export interface ActivityLogger {
|
||||
info(message: string, attrs?: Record<string, unknown>): void;
|
||||
warn(message: string, attrs?: Record<string, unknown>): void;
|
||||
error(message: string, attrs?: Record<string, unknown>): void;
|
||||
}
|
||||
@@ -0,0 +1,67 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Agent type definitions
|
||||
*/
|
||||
|
||||
/**
|
||||
* List of all agents in execution order.
|
||||
* Used for iteration during resume state checking.
|
||||
*/
|
||||
export const ALL_AGENTS = [
|
||||
'pre-recon',
|
||||
'recon',
|
||||
'injection-vuln',
|
||||
'xss-vuln',
|
||||
'auth-vuln',
|
||||
'ssrf-vuln',
|
||||
'authz-vuln',
|
||||
'injection-exploit',
|
||||
'xss-exploit',
|
||||
'auth-exploit',
|
||||
'ssrf-exploit',
|
||||
'authz-exploit',
|
||||
'report',
|
||||
] as const;
|
||||
|
||||
/**
|
||||
* Agent name type derived from ALL_AGENTS.
|
||||
* This ensures type safety and prevents drift between type and array.
|
||||
*/
|
||||
export type AgentName = (typeof ALL_AGENTS)[number];
|
||||
|
||||
export type PlaywrightSession = 'agent1' | 'agent2' | 'agent3' | 'agent4' | 'agent5';
|
||||
|
||||
import type { ActivityLogger } from './activity-logger.js';
|
||||
|
||||
export type AgentValidator = (sourceDir: string, logger: ActivityLogger) => Promise<boolean>;
|
||||
|
||||
export type AgentStatus = 'pending' | 'in_progress' | 'completed' | 'failed' | 'rolled-back';
|
||||
|
||||
export interface AgentDefinition {
|
||||
name: AgentName;
|
||||
displayName: string;
|
||||
prerequisites: AgentName[];
|
||||
promptTemplate: string;
|
||||
deliverableFilename: string;
|
||||
modelTier?: 'small' | 'medium' | 'large';
|
||||
}
|
||||
|
||||
/**
|
||||
* Vulnerability types supported by the pipeline.
|
||||
*/
|
||||
export type VulnType = 'injection' | 'xss' | 'auth' | 'ssrf' | 'authz';
|
||||
|
||||
/**
|
||||
* Decision returned by queue validation for exploitation phase.
|
||||
*/
|
||||
export interface ExploitationDecision {
|
||||
shouldExploit: boolean;
|
||||
shouldRetry: boolean;
|
||||
vulnerabilityCount: number;
|
||||
vulnType: VulnType;
|
||||
}
|
||||
@@ -0,0 +1,35 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Audit system type definitions
|
||||
*/
|
||||
|
||||
/**
|
||||
* Cross-cutting session metadata used by services, temporal, and audit.
|
||||
*/
|
||||
export interface SessionMetadata {
|
||||
id: string;
|
||||
webUrl: string;
|
||||
repoPath?: string;
|
||||
outputPath?: string;
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
/**
|
||||
* Result data passed to audit system when an agent execution ends.
|
||||
* Used by both AuditSession and MetricsTracker.
|
||||
*/
|
||||
export interface AgentEndResult {
|
||||
attemptNumber: number;
|
||||
duration_ms: number;
|
||||
cost_usd: number;
|
||||
success: boolean;
|
||||
model?: string | undefined;
|
||||
error?: string | undefined;
|
||||
checkpoint?: string | undefined;
|
||||
isFinalAttempt?: boolean | undefined;
|
||||
}
|
||||
@@ -0,0 +1,64 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Configuration type definitions
|
||||
*/
|
||||
|
||||
export type RuleType = 'path' | 'subdomain' | 'domain' | 'method' | 'header' | 'parameter';
|
||||
|
||||
export interface Rule {
|
||||
description: string;
|
||||
type: RuleType;
|
||||
url_path: string;
|
||||
}
|
||||
|
||||
export interface Rules {
|
||||
avoid?: Rule[];
|
||||
focus?: Rule[];
|
||||
}
|
||||
|
||||
export type LoginType = 'form' | 'sso' | 'api' | 'basic';
|
||||
|
||||
export interface SuccessCondition {
|
||||
type: 'url_contains' | 'element_present' | 'url_equals_exactly' | 'text_contains';
|
||||
value: string;
|
||||
}
|
||||
|
||||
export interface Credentials {
|
||||
username: string;
|
||||
password: string;
|
||||
totp_secret?: string;
|
||||
}
|
||||
|
||||
export interface Authentication {
|
||||
login_type: LoginType;
|
||||
login_url: string;
|
||||
credentials: Credentials;
|
||||
login_flow?: string[];
|
||||
success_condition: SuccessCondition;
|
||||
}
|
||||
|
||||
export interface Config {
|
||||
rules?: Rules;
|
||||
authentication?: Authentication;
|
||||
pipeline?: PipelineConfig;
|
||||
description?: string;
|
||||
}
|
||||
|
||||
export type RetryPreset = 'default' | 'subscription';
|
||||
|
||||
export interface PipelineConfig {
|
||||
retry_preset?: RetryPreset;
|
||||
max_concurrent_pipelines?: number;
|
||||
}
|
||||
|
||||
export interface DistributedConfig {
|
||||
avoid: Rule[];
|
||||
focus: Rule[];
|
||||
authentication: Authentication | null;
|
||||
description: string;
|
||||
}
|
||||
@@ -0,0 +1,94 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Deliverable Type Definitions
|
||||
*
|
||||
* Maps deliverable types to their filenames and defines validation requirements.
|
||||
*/
|
||||
|
||||
export enum DeliverableType {
|
||||
// Pre-recon agent
|
||||
CODE_ANALYSIS = 'CODE_ANALYSIS',
|
||||
|
||||
// Recon agent
|
||||
RECON = 'RECON',
|
||||
|
||||
// Vulnerability analysis agents
|
||||
INJECTION_ANALYSIS = 'INJECTION_ANALYSIS',
|
||||
INJECTION_QUEUE = 'INJECTION_QUEUE',
|
||||
|
||||
XSS_ANALYSIS = 'XSS_ANALYSIS',
|
||||
XSS_QUEUE = 'XSS_QUEUE',
|
||||
|
||||
AUTH_ANALYSIS = 'AUTH_ANALYSIS',
|
||||
AUTH_QUEUE = 'AUTH_QUEUE',
|
||||
|
||||
AUTHZ_ANALYSIS = 'AUTHZ_ANALYSIS',
|
||||
AUTHZ_QUEUE = 'AUTHZ_QUEUE',
|
||||
|
||||
SSRF_ANALYSIS = 'SSRF_ANALYSIS',
|
||||
SSRF_QUEUE = 'SSRF_QUEUE',
|
||||
|
||||
// Exploitation agents
|
||||
INJECTION_EVIDENCE = 'INJECTION_EVIDENCE',
|
||||
XSS_EVIDENCE = 'XSS_EVIDENCE',
|
||||
AUTH_EVIDENCE = 'AUTH_EVIDENCE',
|
||||
AUTHZ_EVIDENCE = 'AUTHZ_EVIDENCE',
|
||||
SSRF_EVIDENCE = 'SSRF_EVIDENCE',
|
||||
}
|
||||
|
||||
/**
|
||||
* Hard-coded filename mappings from agent prompts
|
||||
*/
|
||||
export const DELIVERABLE_FILENAMES: Record<DeliverableType, string> = {
|
||||
[DeliverableType.CODE_ANALYSIS]: 'code_analysis_deliverable.md',
|
||||
[DeliverableType.RECON]: 'recon_deliverable.md',
|
||||
[DeliverableType.INJECTION_ANALYSIS]: 'injection_analysis_deliverable.md',
|
||||
[DeliverableType.INJECTION_QUEUE]: 'injection_exploitation_queue.json',
|
||||
[DeliverableType.XSS_ANALYSIS]: 'xss_analysis_deliverable.md',
|
||||
[DeliverableType.XSS_QUEUE]: 'xss_exploitation_queue.json',
|
||||
[DeliverableType.AUTH_ANALYSIS]: 'auth_analysis_deliverable.md',
|
||||
[DeliverableType.AUTH_QUEUE]: 'auth_exploitation_queue.json',
|
||||
[DeliverableType.AUTHZ_ANALYSIS]: 'authz_analysis_deliverable.md',
|
||||
[DeliverableType.AUTHZ_QUEUE]: 'authz_exploitation_queue.json',
|
||||
[DeliverableType.SSRF_ANALYSIS]: 'ssrf_analysis_deliverable.md',
|
||||
[DeliverableType.SSRF_QUEUE]: 'ssrf_exploitation_queue.json',
|
||||
[DeliverableType.INJECTION_EVIDENCE]: 'injection_exploitation_evidence.md',
|
||||
[DeliverableType.XSS_EVIDENCE]: 'xss_exploitation_evidence.md',
|
||||
[DeliverableType.AUTH_EVIDENCE]: 'auth_exploitation_evidence.md',
|
||||
[DeliverableType.AUTHZ_EVIDENCE]: 'authz_exploitation_evidence.md',
|
||||
[DeliverableType.SSRF_EVIDENCE]: 'ssrf_exploitation_evidence.md',
|
||||
};
|
||||
|
||||
/**
|
||||
* Queue types that require JSON validation
|
||||
*/
|
||||
export const QUEUE_TYPES: DeliverableType[] = [
|
||||
DeliverableType.INJECTION_QUEUE,
|
||||
DeliverableType.XSS_QUEUE,
|
||||
DeliverableType.AUTH_QUEUE,
|
||||
DeliverableType.AUTHZ_QUEUE,
|
||||
DeliverableType.SSRF_QUEUE,
|
||||
];
|
||||
|
||||
/**
|
||||
* Type guard to check if a deliverable type is a queue
|
||||
*/
|
||||
export function isQueueType(type: string): boolean {
|
||||
return QUEUE_TYPES.includes(type as DeliverableType);
|
||||
}
|
||||
|
||||
/**
|
||||
* Vulnerability queue structure
|
||||
*/
|
||||
export interface VulnerabilityQueue {
|
||||
vulnerabilities: VulnerabilityItem[];
|
||||
}
|
||||
|
||||
export interface VulnerabilityItem {
|
||||
[key: string]: unknown;
|
||||
}
|
||||
@@ -0,0 +1,88 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Error type definitions
|
||||
*/
|
||||
|
||||
/**
|
||||
* Specific error codes for reliable classification.
|
||||
*
|
||||
* ErrorCode provides precision within the coarse 8-category PentestErrorType.
|
||||
* Used by classifyErrorForTemporal for code-based classification (preferred)
|
||||
* with string matching as fallback for external errors.
|
||||
*/
|
||||
export enum ErrorCode {
|
||||
// Config errors (PentestErrorType: 'config')
|
||||
CONFIG_NOT_FOUND = 'CONFIG_NOT_FOUND',
|
||||
CONFIG_VALIDATION_FAILED = 'CONFIG_VALIDATION_FAILED',
|
||||
CONFIG_PARSE_ERROR = 'CONFIG_PARSE_ERROR',
|
||||
|
||||
// Agent execution errors (PentestErrorType: 'validation')
|
||||
AGENT_EXECUTION_FAILED = 'AGENT_EXECUTION_FAILED',
|
||||
OUTPUT_VALIDATION_FAILED = 'OUTPUT_VALIDATION_FAILED',
|
||||
|
||||
// Billing errors (PentestErrorType: 'billing')
|
||||
API_RATE_LIMITED = 'API_RATE_LIMITED',
|
||||
SPENDING_CAP_REACHED = 'SPENDING_CAP_REACHED',
|
||||
INSUFFICIENT_CREDITS = 'INSUFFICIENT_CREDITS',
|
||||
|
||||
// Git errors (PentestErrorType: 'filesystem')
|
||||
GIT_CHECKPOINT_FAILED = 'GIT_CHECKPOINT_FAILED',
|
||||
GIT_ROLLBACK_FAILED = 'GIT_ROLLBACK_FAILED',
|
||||
|
||||
// Prompt errors (PentestErrorType: 'prompt')
|
||||
PROMPT_LOAD_FAILED = 'PROMPT_LOAD_FAILED',
|
||||
|
||||
// Validation errors (PentestErrorType: 'validation')
|
||||
DELIVERABLE_NOT_FOUND = 'DELIVERABLE_NOT_FOUND',
|
||||
|
||||
// Preflight validation errors
|
||||
REPO_NOT_FOUND = 'REPO_NOT_FOUND',
|
||||
TARGET_UNREACHABLE = 'TARGET_UNREACHABLE',
|
||||
AUTH_FAILED = 'AUTH_FAILED',
|
||||
BILLING_ERROR = 'BILLING_ERROR',
|
||||
}
|
||||
|
||||
export type PentestErrorType =
|
||||
| 'config'
|
||||
| 'network'
|
||||
| 'tool'
|
||||
| 'prompt'
|
||||
| 'filesystem'
|
||||
| 'validation'
|
||||
| 'billing'
|
||||
| 'unknown';
|
||||
|
||||
export interface PentestErrorContext {
|
||||
[key: string]: unknown;
|
||||
}
|
||||
|
||||
export interface LogEntry {
|
||||
timestamp: string;
|
||||
context: string;
|
||||
error: {
|
||||
name: string;
|
||||
message: string;
|
||||
type: PentestErrorType;
|
||||
retryable: boolean;
|
||||
stack?: string;
|
||||
};
|
||||
}
|
||||
|
||||
export interface ToolErrorResult {
|
||||
tool: string;
|
||||
output: string;
|
||||
status: 'error';
|
||||
duration: number;
|
||||
success: false;
|
||||
error: Error;
|
||||
}
|
||||
|
||||
export interface PromptErrorResult {
|
||||
success: false;
|
||||
error: Error;
|
||||
}
|
||||
@@ -0,0 +1,18 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Type definitions barrel export
|
||||
*/
|
||||
|
||||
export * from './activity-logger.js';
|
||||
export * from './agents.js';
|
||||
export * from './audit.js';
|
||||
export * from './config.js';
|
||||
export * from './deliverables.js';
|
||||
export * from './errors.js';
|
||||
export * from './metrics.js';
|
||||
export * from './result.js';
|
||||
@@ -0,0 +1,19 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Agent metrics types used across services and activities.
|
||||
* Centralized here to avoid temporal/shared.ts import boundary violations.
|
||||
*/
|
||||
|
||||
export interface AgentMetrics {
|
||||
durationMs: number;
|
||||
inputTokens: number | null;
|
||||
outputTokens: number | null;
|
||||
costUsd: number | null;
|
||||
numTurns: number | null;
|
||||
model?: string | undefined;
|
||||
}
|
||||
@@ -0,0 +1,62 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Minimal Result type for explicit error handling.
|
||||
*
|
||||
* A discriminated union that makes error handling explicit without adding
|
||||
* heavy machinery. Used in key modules (config loading, agent execution,
|
||||
* queue validation) where callers need to make decisions based on error type.
|
||||
*/
|
||||
|
||||
/**
|
||||
* Success variant of Result
|
||||
*/
|
||||
export interface Ok<T> {
|
||||
readonly ok: true;
|
||||
readonly value: T;
|
||||
}
|
||||
|
||||
/**
|
||||
* Error variant of Result
|
||||
*/
|
||||
export interface Err<E> {
|
||||
readonly ok: false;
|
||||
readonly error: E;
|
||||
}
|
||||
|
||||
/**
|
||||
* Result type - either Ok with a value or Err with an error
|
||||
*/
|
||||
export type Result<T, E> = Ok<T> | Err<E>;
|
||||
|
||||
/**
|
||||
* Create a success Result
|
||||
*/
|
||||
export function ok<T>(value: T): Ok<T> {
|
||||
return { ok: true, value };
|
||||
}
|
||||
|
||||
/**
|
||||
* Create an error Result
|
||||
*/
|
||||
export function err<E>(error: E): Err<E> {
|
||||
return { ok: false, error };
|
||||
}
|
||||
|
||||
/**
|
||||
* Type guard for Ok variant
|
||||
*/
|
||||
export function isOk<T, E>(result: Result<T, E>): result is Ok<T> {
|
||||
return result.ok === true;
|
||||
}
|
||||
|
||||
/**
|
||||
* Type guard for Err variant
|
||||
*/
|
||||
export function isErr<T, E>(result: Result<T, E>): result is Err<E> {
|
||||
return result.ok === false;
|
||||
}
|
||||
@@ -0,0 +1,91 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Consolidated billing/spending cap detection utilities.
|
||||
*
|
||||
* Anthropic's spending cap behavior is inconsistent:
|
||||
* - Sometimes a proper SDK error (billing_error)
|
||||
* - Sometimes Claude responds with text about the cap
|
||||
* - Sometimes partial billing before cutoff
|
||||
*
|
||||
* This module provides defense-in-depth detection with shared pattern lists
|
||||
* to prevent drift between detection points.
|
||||
*/
|
||||
|
||||
/**
|
||||
* Text patterns for SDK output sniffing (what Claude says).
|
||||
* Used by message-handlers.ts and the behavioral heuristic.
|
||||
*/
|
||||
export const BILLING_TEXT_PATTERNS = [
|
||||
'spending cap',
|
||||
'spending limit',
|
||||
'cap reached',
|
||||
'budget exceeded',
|
||||
'usage limit',
|
||||
'resets',
|
||||
] as const;
|
||||
|
||||
/**
|
||||
* API patterns for error message classification (what the API returns).
|
||||
* Used by classifyErrorForTemporal in error-handling.ts.
|
||||
*/
|
||||
export const BILLING_API_PATTERNS = [
|
||||
'billing_error',
|
||||
'credit balance is too low',
|
||||
'insufficient credits',
|
||||
'usage is blocked due to insufficient credits',
|
||||
'please visit plans & billing',
|
||||
'please visit plans and billing',
|
||||
'usage limit reached',
|
||||
'quota exceeded',
|
||||
'daily rate limit',
|
||||
'limit will reset',
|
||||
'billing limit reached',
|
||||
] as const;
|
||||
|
||||
/**
|
||||
* Checks if text matches any billing text pattern.
|
||||
* Used for sniffing SDK output content for spending cap messages.
|
||||
*/
|
||||
export function matchesBillingTextPattern(text: string): boolean {
|
||||
const lowerText = text.toLowerCase();
|
||||
return BILLING_TEXT_PATTERNS.some((pattern) => lowerText.includes(pattern));
|
||||
}
|
||||
|
||||
/**
|
||||
* Checks if an error message matches any billing API pattern.
|
||||
* Used for classifying API error messages.
|
||||
*/
|
||||
export function matchesBillingApiPattern(message: string): boolean {
|
||||
const lowerMessage = message.toLowerCase();
|
||||
return BILLING_API_PATTERNS.some((pattern) => lowerMessage.includes(pattern));
|
||||
}
|
||||
|
||||
/**
|
||||
* Behavioral heuristic for detecting spending cap.
|
||||
*
|
||||
* When Claude hits a spending cap, it often returns a short message
|
||||
* with $0 cost. Legitimate agent work NEVER costs $0 with only 1-2 turns.
|
||||
*
|
||||
* This combines three signals:
|
||||
* 1. Very low turn count (<=2)
|
||||
* 2. Zero cost ($0)
|
||||
* 3. Text matches billing patterns
|
||||
*
|
||||
* @param turns - Number of turns the agent took
|
||||
* @param cost - Total cost in USD
|
||||
* @param resultText - The result text from the agent
|
||||
* @returns true if this looks like a spending cap hit
|
||||
*/
|
||||
export function isSpendingCapBehavior(turns: number, cost: number, resultText: string): boolean {
|
||||
// Only check if turns <= 2 AND cost is exactly 0
|
||||
if (turns > 2 || cost !== 0) {
|
||||
return false;
|
||||
}
|
||||
|
||||
return matchesBillingTextPattern(resultText);
|
||||
}
|
||||
@@ -0,0 +1,60 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Concurrency Control Utilities
|
||||
*
|
||||
* Provides mutex implementation for preventing race conditions during
|
||||
* concurrent session operations.
|
||||
*/
|
||||
|
||||
type UnlockFunction = () => void;
|
||||
|
||||
/**
|
||||
* SessionMutex - Promise-based mutex for session file operations
|
||||
*
|
||||
* Prevents race conditions when multiple agents or operations attempt to
|
||||
* modify the same session data simultaneously. This is particularly important
|
||||
* during parallel execution of vulnerability analysis and exploitation phases.
|
||||
*
|
||||
* Usage:
|
||||
* ```ts
|
||||
* const mutex = new SessionMutex();
|
||||
* const unlock = await mutex.lock(sessionId);
|
||||
* try {
|
||||
* // Critical section - modify session data
|
||||
* } finally {
|
||||
* unlock(); // Always release the lock
|
||||
* }
|
||||
* ```
|
||||
*/
|
||||
// Promise-based mutex with chained queue semantics - safe for parallel agents on same session
|
||||
export class SessionMutex {
|
||||
// Map of sessionId -> Promise (tail of the FIFO queue)
|
||||
private locks: Map<string, Promise<void>> = new Map();
|
||||
|
||||
// Chain onto the queue tail, then wait for predecessor to release. Guarantees FIFO ordering.
|
||||
async lock(sessionId: string): Promise<UnlockFunction> {
|
||||
// 1. Capture the current tail of the queue
|
||||
const prev = this.locks.get(sessionId) ?? Promise.resolve();
|
||||
|
||||
// 2. Create our lock and immediately become the new tail
|
||||
let resolve: () => void;
|
||||
const promise = new Promise<void>((r) => (resolve = r));
|
||||
this.locks.set(sessionId, promise);
|
||||
|
||||
// 3. Wait for predecessor to release
|
||||
await prev;
|
||||
|
||||
// 4. Return unlock that releases the next waiter in the chain
|
||||
return () => {
|
||||
if (this.locks.get(sessionId) === promise) {
|
||||
this.locks.delete(sessionId);
|
||||
}
|
||||
resolve();
|
||||
};
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,73 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* File I/O Utilities
|
||||
*
|
||||
* Core utility functions for file operations including atomic writes,
|
||||
* directory creation, and JSON file handling.
|
||||
*/
|
||||
|
||||
import fs from 'node:fs/promises';
|
||||
|
||||
/**
|
||||
* Ensure directory exists (idempotent, race-safe)
|
||||
*/
|
||||
export async function ensureDirectory(dirPath: string): Promise<void> {
|
||||
try {
|
||||
await fs.mkdir(dirPath, { recursive: true });
|
||||
} catch (error) {
|
||||
// Ignore EEXIST errors (race condition safe)
|
||||
if ((error as NodeJS.ErrnoException).code !== 'EEXIST') {
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Atomic write using temp file + rename pattern
|
||||
* Guarantees no partial writes or corruption on crash
|
||||
*/
|
||||
export async function atomicWrite(filePath: string, data: object | string): Promise<void> {
|
||||
const tempPath = `${filePath}.tmp`;
|
||||
const content = typeof data === 'string' ? data : JSON.stringify(data, null, 2);
|
||||
|
||||
try {
|
||||
// Write to temp file
|
||||
await fs.writeFile(tempPath, content, 'utf8');
|
||||
|
||||
// Atomic rename (POSIX guarantee: atomic on same filesystem)
|
||||
await fs.rename(tempPath, filePath);
|
||||
} catch (error) {
|
||||
// Clean up temp file on failure
|
||||
try {
|
||||
await fs.unlink(tempPath);
|
||||
} catch {
|
||||
// Ignore cleanup errors
|
||||
}
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Read and parse JSON file
|
||||
*/
|
||||
export async function readJson<T = unknown>(filePath: string): Promise<T> {
|
||||
const content = await fs.readFile(filePath, 'utf8');
|
||||
return JSON.parse(content) as T;
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if file exists
|
||||
*/
|
||||
export async function fileExists(filePath: string): Promise<boolean> {
|
||||
try {
|
||||
await fs.access(filePath);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,60 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Formatting Utilities
|
||||
*
|
||||
* Generic formatting functions for durations, timestamps, and percentages.
|
||||
*/
|
||||
|
||||
/**
|
||||
* Format duration in milliseconds to human-readable string
|
||||
*/
|
||||
export function formatDuration(ms: number): string {
|
||||
if (ms < 1000) {
|
||||
return `${ms}ms`;
|
||||
}
|
||||
|
||||
const seconds = ms / 1000;
|
||||
if (seconds < 60) {
|
||||
return `${seconds.toFixed(1)}s`;
|
||||
}
|
||||
|
||||
const minutes = Math.floor(seconds / 60);
|
||||
const remainingSeconds = Math.floor(seconds % 60);
|
||||
return `${minutes}m ${remainingSeconds}s`;
|
||||
}
|
||||
|
||||
/**
|
||||
* Format timestamp to ISO 8601 string
|
||||
*/
|
||||
export function formatTimestamp(timestamp: number = Date.now()): string {
|
||||
return new Date(timestamp).toISOString();
|
||||
}
|
||||
|
||||
/**
|
||||
* Calculate percentage
|
||||
*/
|
||||
export function calculatePercentage(part: number, total: number): number {
|
||||
if (total === 0) return 0;
|
||||
return (part / total) * 100;
|
||||
}
|
||||
|
||||
/**
|
||||
* Extract agent type from description string for display purposes
|
||||
*/
|
||||
export function extractAgentType(description: string): string {
|
||||
if (description.includes('Pre-recon')) {
|
||||
return 'pre-reconnaissance';
|
||||
}
|
||||
if (description.includes('Recon')) {
|
||||
return 'reconnaissance';
|
||||
}
|
||||
if (description.includes('Report')) {
|
||||
return 'report generation';
|
||||
}
|
||||
return 'analysis';
|
||||
}
|
||||
@@ -0,0 +1,26 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
/**
|
||||
* Functional Programming Utilities
|
||||
*
|
||||
* Generic functional composition patterns for async operations.
|
||||
*/
|
||||
|
||||
// biome-ignore lint/suspicious/noExplicitAny: pipeline functions need flexible typing for composition
|
||||
type PipelineFunction = (x: any) => any | Promise<any>;
|
||||
|
||||
/**
|
||||
* Async pipeline that passes result through a series of functions.
|
||||
* Clearer than reduce-based pipe and easier to debug.
|
||||
*/
|
||||
export async function asyncPipe<TResult>(initial: unknown, ...fns: PipelineFunction[]): Promise<TResult> {
|
||||
let result = initial;
|
||||
for (const fn of fns) {
|
||||
result = await fn(result);
|
||||
}
|
||||
return result as TResult;
|
||||
}
|
||||
@@ -0,0 +1,26 @@
|
||||
// Copyright (C) 2025 Keygraph, Inc.
|
||||
//
|
||||
// This program is free software: you can redistribute it and/or modify
|
||||
// it under the terms of the GNU Affero General Public License version 3
|
||||
// as published by the Free Software Foundation.
|
||||
|
||||
export class Timer {
|
||||
name: string;
|
||||
startTime: number;
|
||||
endTime: number | null = null;
|
||||
|
||||
constructor(name: string) {
|
||||
this.name = name;
|
||||
this.startTime = Date.now();
|
||||
}
|
||||
|
||||
stop(): number {
|
||||
this.endTime = Date.now();
|
||||
return this.duration();
|
||||
}
|
||||
|
||||
duration(): number {
|
||||
const end = this.endTime || Date.now();
|
||||
return end - this.startTime;
|
||||
}
|
||||
}
|
||||
Reference in New Issue
Block a user