feat: add npx CLI with monorepo, CI/CD, and ephemeral worker architecture (#256)

* feat: integrate npx CLI, CI/CD, and ephemeral worker architecture

Bring in changes from shannon-npx: npx-distributable CLI package (cli/),
semantic-release CI/CD workflows, ephemeral per-scan worker containers,
TOML config support, setup wizard, and workspace management.

Preserves all shannon-only changes: security hardening (localhost-bound
ports, MCP env allowlist, path traversal guard), updated benchmarks
(XBEN 19/31/35/44), README assets, and prompt injection disclaimer.

Applies security hardening to cli/infra/compose.yml as well.

* refactor: migrate to Turborepo + pnpm + Biome monorepo

Restructure into apps/worker, apps/cli, packages/mcp-server with
Turborepo task orchestration, pnpm workspaces, Biome linting/formatting,
and tsdown CLI bundling.

Key changes:
- src/ -> apps/worker/src/, cli/ -> apps/cli/, mcp-server/ -> packages/mcp-server/
- prompts/ and configs/ moved into apps/worker/
- npm replaced with pnpm, package-lock.json replaced with pnpm-lock.yaml
- Dockerfile updated for pnpm-based builds
- CLI logs command rewritten with chokidar for cross-platform reliability
- Router health checking added for auto-detected router mode
- Centralized path resolution via apps/worker/src/paths.ts

* fix: resolve all biome warnings and formatting issues

- Remove unnecessary non-null assertions where values are guaranteed
- Replace array index access with .at() for safer element retrieval
- Use local variables to avoid repeated process.env lookups
- Replace any types with unknown in functional utilities
- Use nullish coalescing for TOTP hash byte access
- Auto-format security patches to match biome config

* fix: pin pnpm to 10.12.1 in Dockerfile for catalog support

* fix: handle Esc cancellation in Bedrock setup flow

Replace p.group() with individual prompts and per-field cancel checks,
matching the pattern used by all other provider setup flows.

* feat: add optional model customization to Anthropic setup

* fix: resolve Docker bind mount permission errors on Linux

Use entrypoint-based UID remapping instead of --user flag so the
container's pentest user matches the host UID/GID, keeping bind-mounted
volumes writable. Git config moved to --system level to survive remapping.

* fix: show resumed workflow ID in splash screen URL

When resuming a workflow, the Temporal Web UI link pointed to the old
(terminated) workflow ID. Now extracts "New Workflow ID" from the resume
header in workflow.log, falling back to the original ID for fresh scans.

* style: fix biome formatting in docker.ts

* fix: align TypeScript config types with JSON Schema

- SuccessCondition.type: use schema values (url_contains,
  element_present, url_equals_exactly, text_contains) instead of
  stale values (url, cookie, element, redirect)
- Authentication.login_flow: mark optional to match schema which
  does not require it

* feat: mark GitHub release as latest during rollback

* fix: use native ARM64 runners for Docker multi-platform builds

Replace QEMU emulation with parallel native builds using a matrix
strategy (ubuntu-latest for amd64, ubuntu-24.04-arm for arm64).
Each platform pushes by digest, then a merge job creates the
multi-arch manifest list before signing with cosign.

* fix: resolve SessionMutex race condition with 3+ concurrent waiters

* fix: skip POSIX permission check on Windows

writeFileSync mode option is ignored on Windows, so config.toml
gets 0o666 and the guard rejects it.

* fix: resolve unsubstituted placeholders in report prompt

Remove unused {{GITHUB_URL}} placeholder and wire up {{AUTH_CONTEXT}}
with structured auth context (login type, username, URL, MFA status).

* fix: remove duplicate environment gate from merge-docker job

Move DOCKERHUB_USERNAME from vars to secrets so merge-docker can access
credentials without its own environment scope. This eliminates the
redundant double approval since build-docker already gates on
release-publish.

* fix: replace POSIX sleep binary with cross-platform async sleep

execFileSync('sleep') is unavailable on Windows. Use node:timers/promises
setTimeout instead, making ensureInfra async.

* fix: use session.json for workflow ID on resume instead of parsing workflow.log

On resume, workflow.log already exists with stale headers from the
previous run. The CLI poll found '====' immediately and extracted the
old workflow ID, producing a wrong Temporal Web UI URL.

Read the workflow ID from session.json instead — the worker writes
resume attempts there atomically. For fresh runs, poll until
originalWorkflowId appears. For resumes, poll until a new
resumeAttempts entry is appended.

* feat: add custom base URL support for Anthropic-compatible proxies

Support ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN to route SDK requests
through LiteLLM or any Anthropic-compatible proxy. Adds TUI wizard
option, TOML config mapping, credential validation, and preflight
endpoint reachability check via SDK query.

* fix: remove environment gates and add NPM_TOKEN to publish step

* feat: add beta release and rollback workflows with cosign signing

* fix: remove redundant checkout and pnpm steps from beta release workflow

* docs: normalize README commands to mode-neutral shorthand

Add a substitution note after Quick Start sections so all subsequent
examples use bare `shannon` instead of mixing `./shannon` and
`npx @keygraph/shannon`. Mode-specific commands (build, update,
uninstall) get inline annotations. Also fixes a broken command in the
Custom Base URL section.

* fix: remove redundant `update` command

Image is already auto-pulled by `ensureImage()` during `start` when the
pinned version tag is missing locally. Manual `update` was unnecessary.

* docs: add CLI package README stub

* docs: update README setup instructions for dual CLI modes

* docs: update announcement banner to npx availability

* feat: migrate from MCP tools to CLI based tools (#252)

* feat: migrate from MCP tools to CLI tools

* fix: restore browser action emoji formatters for CLI output

Adapt formatBrowserAction for playwright-cli commands, replacing the old
mcp__playwright__browser_* tool name matching removed during migration.

* fix: mount credential file to fixed container path for Vertex AI

GOOGLE_APPLICATION_CREDENTIALS was forwarded as-is to the container,
causing the relative host path to resolve against the repo mount
instead of the credentials mount. Now both local and npx modes mount
the resolved file to /app/credentials/google-sa-key.json and rewrite
the env var to match.

* feat: add git awareness and optional description field to config

* fix: drop redundant --ipc host flag from worker container

* fix: align announcement banner URL with main branch

* feat: add target URL reachability preflight check (#254)

* Moving asset benchmark graph image to this folder

* Move benchmark results to benchmark repo

Windows Defender flags exploit code in the pentest reports as false positives, forcing every Windows user to add a Defender exclusion just to clone Shannon.

* Updated README

* fix: case-insensitive grep for semantic-release version probe

* fix: harden supply chain security (#255)

* fix: patch smol-toml and tsdown vulnerabilities

Update smol-toml 1.6.0→1.6.1 (DoS via recursive comment parsing) and
tsdown 0.21.2→0.21.5 (picomatch ReDoS + method injection).

* fix: pin all unpinned dependency versions in Dockerfile

Pins subfinder v2.13.0, WhatWeb v0.6.3 (switched from git clone to
release tarball), schemathesis 4.13.0, addressable 2.8.9,
claude-code 2.1.84, and playwright-cli 0.1.1 for reproducible builds.

* fix: pin GitHub Actions to commit SHAs for supply chain security

* fix: pin GitHub Actions to commit SHAs in beta and rollback workflows
This commit is contained in:
ezl-keygraph
2026-03-27 02:34:29 +05:30
committed by GitHub
parent 0d172f5e32
commit bc8fd203ed
4058 changed files with 7774 additions and 1189080 deletions
+272
View File
@@ -0,0 +1,272 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
/**
* Agent Execution Service
*
* Handles the full agent lifecycle:
* - Load config via ConfigLoaderService
* - Load prompt template using AGENTS[agentName].promptTemplate
* - Create git checkpoint
* - Start audit logging
* - Invoke Claude SDK via runClaudePrompt
* - Spending cap check using isSpendingCapBehavior
* - Handle failure (rollback, audit)
* - Validate output using AGENTS[agentName].deliverableFilename
* - Commit on success, log metrics
*
* No Temporal dependencies - pure domain logic.
*/
import { type ClaudePromptResult, runClaudePrompt, validateAgentOutput } from '../ai/claude-executor.js';
import type { AuditSession } from '../audit/index.js';
import { AGENTS } from '../session-manager.js';
import type { ActivityLogger } from '../types/activity-logger.js';
import type { AgentName } from '../types/agents.js';
import type { AgentEndResult } from '../types/audit.js';
import { ErrorCode, type PentestErrorType } from '../types/errors.js';
import type { AgentMetrics } from '../types/metrics.js';
import { err, isErr, ok, type Result } from '../types/result.js';
import { isSpendingCapBehavior } from '../utils/billing-detection.js';
import type { ConfigLoaderService } from './config-loader.js';
import { PentestError } from './error-handling.js';
import { commitGitSuccess, createGitCheckpoint, getGitCommitHash, rollbackGitWorkspace } from './git-manager.js';
import { loadPrompt } from './prompt-manager.js';
/**
* Input for agent execution.
*/
export interface AgentExecutionInput {
webUrl: string;
repoPath: string;
configPath?: string | undefined;
pipelineTestingMode?: boolean | undefined;
attemptNumber: number;
}
interface FailAgentOpts {
attemptNumber: number;
result: ClaudePromptResult;
rollbackReason: string;
errorMessage: string;
errorCode: ErrorCode;
category: PentestErrorType;
retryable: boolean;
context: Record<string, unknown>;
}
/**
* Service for executing agents with full lifecycle management.
*
* NOTE: AuditSession is passed per-execution, NOT stored on the service.
* This is critical for parallel agent execution - each agent needs its own
* AuditSession instance because AuditSession uses instance state (currentAgentName)
* to track which agent is currently logging.
*/
export class AgentExecutionService {
private readonly configLoader: ConfigLoaderService;
constructor(configLoader: ConfigLoaderService) {
this.configLoader = configLoader;
}
/**
* Execute an agent with full lifecycle management.
*
* @param agentName - Name of the agent to execute
* @param input - Execution input parameters
* @param auditSession - Audit session for this specific agent execution
* @returns Result containing AgentEndResult on success, PentestError on failure
*/
async execute(
agentName: AgentName,
input: AgentExecutionInput,
auditSession: AuditSession,
logger: ActivityLogger,
): Promise<Result<AgentEndResult, PentestError>> {
const { webUrl, repoPath, configPath, pipelineTestingMode = false, attemptNumber } = input;
// 1. Load config (if provided)
const configResult = await this.configLoader.loadOptional(configPath);
if (isErr(configResult)) {
return configResult;
}
const distributedConfig = configResult.value;
// 2. Load prompt
const promptTemplate = AGENTS[agentName].promptTemplate;
let prompt: string;
try {
prompt = await loadPrompt(promptTemplate, { webUrl, repoPath }, distributedConfig, pipelineTestingMode, logger);
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
return err(
new PentestError(
`Failed to load prompt for ${agentName}: ${errorMessage}`,
'prompt',
false,
{ agentName, promptTemplate, originalError: errorMessage },
ErrorCode.PROMPT_LOAD_FAILED,
),
);
}
// 3. Create git checkpoint before execution
try {
await createGitCheckpoint(repoPath, agentName, attemptNumber, logger);
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
return err(
new PentestError(
`Failed to create git checkpoint for ${agentName}: ${errorMessage}`,
'filesystem',
false,
{ agentName, repoPath, originalError: errorMessage },
ErrorCode.GIT_CHECKPOINT_FAILED,
),
);
}
// 4. Start audit logging
await auditSession.startAgent(agentName, prompt, attemptNumber);
// 5. Execute agent
const result: ClaudePromptResult = await runClaudePrompt(
prompt,
repoPath,
'', // context
agentName, // description
agentName,
auditSession,
logger,
AGENTS[agentName].modelTier,
);
// 6. Spending cap check - defense-in-depth
if (result.success && (result.turns ?? 0) <= 2 && (result.cost || 0) === 0) {
const resultText = result.result || '';
if (isSpendingCapBehavior(result.turns ?? 0, result.cost || 0, resultText)) {
return this.failAgent(agentName, repoPath, auditSession, logger, {
attemptNumber,
result,
rollbackReason: 'spending cap detected',
errorMessage: `Spending cap likely reached: ${resultText.slice(0, 100)}`,
errorCode: ErrorCode.SPENDING_CAP_REACHED,
category: 'billing',
retryable: true,
context: { agentName, turns: result.turns, cost: result.cost },
});
}
}
// 7. Handle execution failure
if (!result.success) {
return this.failAgent(agentName, repoPath, auditSession, logger, {
attemptNumber,
result,
rollbackReason: 'execution failure',
errorMessage: result.error || 'Agent execution failed',
errorCode: ErrorCode.AGENT_EXECUTION_FAILED,
category: 'validation',
retryable: result.retryable ?? true,
context: { agentName, originalError: result.error },
});
}
// 8. Validate output
const validationPassed = await validateAgentOutput(result, agentName, repoPath, logger);
if (!validationPassed) {
return this.failAgent(agentName, repoPath, auditSession, logger, {
attemptNumber,
result,
rollbackReason: 'validation failure',
errorMessage: `Agent ${agentName} failed output validation`,
errorCode: ErrorCode.OUTPUT_VALIDATION_FAILED,
category: 'validation',
retryable: true,
context: { agentName, deliverableFilename: AGENTS[agentName].deliverableFilename },
});
}
// 9. Success - commit deliverables, then capture checkpoint hash
await commitGitSuccess(repoPath, agentName, logger);
const commitHash = await getGitCommitHash(repoPath);
const endResult: AgentEndResult = {
attemptNumber,
duration_ms: result.duration,
cost_usd: result.cost || 0,
success: true,
model: result.model,
...(commitHash && { checkpoint: commitHash }),
};
await auditSession.endAgent(agentName, endResult);
return ok(endResult);
}
private async failAgent(
agentName: AgentName,
repoPath: string,
auditSession: AuditSession,
logger: ActivityLogger,
opts: FailAgentOpts,
): Promise<Result<AgentEndResult, PentestError>> {
await rollbackGitWorkspace(repoPath, opts.rollbackReason, logger);
const endResult: AgentEndResult = {
attemptNumber: opts.attemptNumber,
duration_ms: opts.result.duration,
cost_usd: opts.result.cost || 0,
success: false,
model: opts.result.model,
error: opts.errorMessage,
};
await auditSession.endAgent(agentName, endResult);
return err(new PentestError(opts.errorMessage, opts.category, opts.retryable, opts.context, opts.errorCode));
}
/**
* Execute an agent, throwing PentestError on failure.
*
* This is the preferred method for Temporal activities, which need to
* catch errors and classify them into ApplicationFailure. Avoids requiring
* activities to import Result utilities, keeping the boundary clean.
*
* @param agentName - Name of the agent to execute
* @param input - Execution input parameters
* @param auditSession - Audit session for this specific agent execution
* @returns AgentEndResult on success
* @throws PentestError on failure
*/
async executeOrThrow(
agentName: AgentName,
input: AgentExecutionInput,
auditSession: AuditSession,
logger: ActivityLogger,
): Promise<AgentEndResult> {
const result = await this.execute(agentName, input, auditSession, logger);
if (isErr(result)) {
throw result.error;
}
return result.value;
}
/**
* Convert AgentEndResult to AgentMetrics for workflow state.
*/
static toMetrics(endResult: AgentEndResult, result: ClaudePromptResult): AgentMetrics {
return {
durationMs: endResult.duration_ms,
inputTokens: null, // Not currently exposed by SDK wrapper
outputTokens: null,
costUsd: endResult.cost_usd,
numTurns: result.turns ?? null,
model: result.model,
};
}
}
+73
View File
@@ -0,0 +1,73 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
/**
* Config Loader Service
*
* Wraps parseConfig + distributeConfig with Result type for explicit error handling.
* Pure service with no Temporal dependencies.
*/
import { distributeConfig, parseConfig } from '../config-parser.js';
import type { DistributedConfig } from '../types/config.js';
import { ErrorCode } from '../types/errors.js';
import { err, ok, type Result } from '../types/result.js';
import { PentestError } from './error-handling.js';
/**
* Service for loading and distributing configuration files.
*
* Provides a Result-based API for explicit error handling,
* allowing callers to decide how to handle failures.
*/
export class ConfigLoaderService {
/**
* Load and distribute a configuration file.
*
* @param configPath - Path to the YAML configuration file
* @returns Result containing DistributedConfig on success, PentestError on failure
*/
async load(configPath: string): Promise<Result<DistributedConfig, PentestError>> {
try {
const config = await parseConfig(configPath);
const distributed = distributeConfig(config);
return ok(distributed);
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
// Determine appropriate error code based on error message
let errorCode = ErrorCode.CONFIG_PARSE_ERROR;
if (errorMessage.includes('not found') || errorMessage.includes('ENOENT')) {
errorCode = ErrorCode.CONFIG_NOT_FOUND;
} else if (errorMessage.includes('validation failed')) {
errorCode = ErrorCode.CONFIG_VALIDATION_FAILED;
}
return err(
new PentestError(
`Failed to load config ${configPath}: ${errorMessage}`,
'config',
false,
{ configPath, originalError: errorMessage },
errorCode,
),
);
}
}
/**
* Load config if path is provided, otherwise return null config.
*
* @param configPath - Optional path to the YAML configuration file
* @returns Result containing DistributedConfig (or null) on success, PentestError on failure
*/
async loadOptional(configPath: string | undefined): Promise<Result<DistributedConfig | null, PentestError>> {
if (!configPath) {
return ok(null);
}
return this.load(configPath);
}
}
+114
View File
@@ -0,0 +1,114 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
/**
* Dependency Injection Container
*
* Provides a per-workflow container for service instances.
* Services are wired with explicit constructor injection.
*
* Usage:
* const container = getOrCreateContainer(workflowId, sessionMetadata);
* const auditSession = new AuditSession(sessionMetadata); // Per-agent
* await auditSession.initialize(workflowId);
* const result = await container.agentExecution.executeOrThrow(agentName, input, auditSession);
*/
import type { SessionMetadata } from '../audit/utils.js';
import { AgentExecutionService } from './agent-execution.js';
import { ConfigLoaderService } from './config-loader.js';
import { ExploitationCheckerService } from './exploitation-checker.js';
/**
* Dependencies required to create a Container.
*
* NOTE: AuditSession is NOT stored in the container.
* Each agent execution receives its own AuditSession instance
* because AuditSession uses instance state (currentAgentName) that
* cannot be shared across parallel agents.
*/
export interface ContainerDependencies {
readonly sessionMetadata: SessionMetadata;
}
/**
* DI Container for a single workflow.
*
* Holds all service instances for the workflow lifecycle.
* Services are instantiated once and reused across agent executions.
*
* NOTE: AuditSession is NOT stored here - it's passed per agent execution
* to support parallel agents each having their own logging context.
*/
export class Container {
readonly sessionMetadata: SessionMetadata;
readonly agentExecution: AgentExecutionService;
readonly configLoader: ConfigLoaderService;
readonly exploitationChecker: ExploitationCheckerService;
constructor(deps: ContainerDependencies) {
this.sessionMetadata = deps.sessionMetadata;
// Wire services with explicit constructor injection
this.configLoader = new ConfigLoaderService();
this.exploitationChecker = new ExploitationCheckerService();
this.agentExecution = new AgentExecutionService(this.configLoader);
}
}
/**
* Map of workflowId to Container instance.
* Each workflow gets its own container scoped to its lifecycle.
*/
const containers = new Map<string, Container>();
/**
* Get or create a Container for a workflow.
*
* If a container already exists for the workflowId, returns it.
* Otherwise, creates a new container with the provided dependencies.
*
* @param workflowId - Unique workflow identifier
* @param sessionMetadata - Session metadata for audit paths
* @returns Container instance for the workflow
*/
export function getOrCreateContainer(workflowId: string, sessionMetadata: SessionMetadata): Container {
let container = containers.get(workflowId);
if (!container) {
container = new Container({ sessionMetadata });
containers.set(workflowId, container);
}
return container;
}
/**
* Remove a Container when a workflow completes.
*
* Should be called in logWorkflowComplete to clean up resources.
*
* @param workflowId - Unique workflow identifier
*/
export function removeContainer(workflowId: string): void {
containers.delete(workflowId);
}
/**
* Get an existing Container for a workflow, if one exists.
*
* Unlike getOrCreateContainer, this does NOT create a new container.
* Returns undefined if no container exists for the workflowId.
*
* Useful for lightweight activities that can benefit from an existing
* container but don't need to create one.
*
* @param workflowId - Unique workflow identifier
* @returns Container instance or undefined
*/
export function getContainer(workflowId: string): Container | undefined {
return containers.get(workflowId);
}
+244
View File
@@ -0,0 +1,244 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
import { ErrorCode, type PentestErrorContext, type PentestErrorType, type PromptErrorResult } from '../types/errors.js';
import { matchesBillingApiPattern, matchesBillingTextPattern } from '../utils/billing-detection.js';
export class PentestError extends Error {
override name = 'PentestError' as const;
type: PentestErrorType;
retryable: boolean;
context: PentestErrorContext;
timestamp: string;
/** Optional specific error code for reliable classification */
code?: ErrorCode;
constructor(
message: string,
type: PentestErrorType,
retryable: boolean = false,
context: PentestErrorContext = {},
code?: ErrorCode,
) {
super(message);
this.type = type;
this.retryable = retryable;
this.context = context;
this.timestamp = new Date().toISOString();
if (code !== undefined) {
this.code = code;
}
}
}
export function handlePromptError(promptName: string, error: Error): PromptErrorResult {
return {
success: false,
error: new PentestError(`Failed to load prompt '${promptName}': ${error.message}`, 'prompt', false, {
promptName,
originalError: error.message,
}),
};
}
const RETRYABLE_PATTERNS = [
// Network and connection errors
'network',
'connection',
'timeout',
'econnreset',
'enotfound',
'econnrefused',
// Rate limiting
'rate limit',
'429',
'too many requests',
// Server errors
'server error',
'5xx',
'internal server error',
'service unavailable',
'bad gateway',
// Claude API errors
'model unavailable',
'service temporarily unavailable',
'api error',
'terminated',
// Max turns
'max turns',
'maximum turns',
];
// Patterns that indicate non-retryable errors (checked before default)
const NON_RETRYABLE_PATTERNS = [
'authentication',
'invalid prompt',
'out of memory',
'permission denied',
'session limit reached',
'invalid api key',
];
// Conservative retry classification - unknown errors don't retry (fail-safe default)
export function isRetryableError(error: Error): boolean {
const message = error.message.toLowerCase();
if (NON_RETRYABLE_PATTERNS.some((pattern) => message.includes(pattern))) {
return false;
}
return RETRYABLE_PATTERNS.some((pattern) => message.includes(pattern));
}
/**
* Classifies errors by ErrorCode for reliable, code-based classification.
* Used when error is a PentestError with a specific ErrorCode.
*/
function classifyByErrorCode(code: ErrorCode, retryableFromError: boolean): { type: string; retryable: boolean } {
switch (code) {
// Billing errors - retryable (wait for cap reset or credits added)
case ErrorCode.SPENDING_CAP_REACHED:
case ErrorCode.INSUFFICIENT_CREDITS:
return { type: 'BillingError', retryable: true };
case ErrorCode.API_RATE_LIMITED:
return { type: 'RateLimitError', retryable: true };
// Config errors - non-retryable (need manual fix)
case ErrorCode.CONFIG_NOT_FOUND:
case ErrorCode.CONFIG_VALIDATION_FAILED:
case ErrorCode.CONFIG_PARSE_ERROR:
return { type: 'ConfigurationError', retryable: false };
// Prompt errors - non-retryable (need manual fix)
case ErrorCode.PROMPT_LOAD_FAILED:
return { type: 'ConfigurationError', retryable: false };
// Git errors - non-retryable (indicates workspace corruption)
case ErrorCode.GIT_CHECKPOINT_FAILED:
case ErrorCode.GIT_ROLLBACK_FAILED:
return { type: 'GitError', retryable: false };
// Validation errors - retryable (agent may succeed on retry)
case ErrorCode.OUTPUT_VALIDATION_FAILED:
case ErrorCode.DELIVERABLE_NOT_FOUND:
return { type: 'OutputValidationError', retryable: true };
// Agent execution - use the retryable flag from the error
case ErrorCode.AGENT_EXECUTION_FAILED:
return { type: 'AgentExecutionError', retryable: retryableFromError };
// Preflight validation errors
case ErrorCode.REPO_NOT_FOUND:
return { type: 'ConfigurationError', retryable: false };
case ErrorCode.AUTH_FAILED:
return { type: 'AuthenticationError', retryable: false };
case ErrorCode.BILLING_ERROR:
return { type: 'BillingError', retryable: true };
default:
// Unknown code - fall through to string matching
return { type: 'UnknownError', retryable: retryableFromError };
}
}
/**
* Classifies errors for Temporal workflow retry behavior.
* Returns error type and whether Temporal should retry.
*
* Used by activities to wrap errors in ApplicationFailure:
* - Retryable errors: Temporal retries with configured backoff
* - Non-retryable errors: Temporal fails immediately
*
* Classification priority:
* 1. If error is PentestError with ErrorCode, classify by code (reliable)
* 2. Fall through to string matching for external errors (SDK, network, etc.)
*/
export function classifyErrorForTemporal(error: unknown): { type: string; retryable: boolean } {
// === CODE-BASED CLASSIFICATION (Preferred for internal errors) ===
if (error instanceof PentestError && error.code !== undefined) {
return classifyByErrorCode(error.code, error.retryable);
}
// === STRING-BASED CLASSIFICATION (Fallback for external errors) ===
const message = (error instanceof Error ? error.message : String(error)).toLowerCase();
// === BILLING ERRORS (Retryable with long backoff) ===
// Anthropic returns billing as 400 invalid_request_error
// Human can add credits OR wait for spending cap to reset (5-30 min backoff)
// Check both API patterns and text patterns for comprehensive detection
if (matchesBillingApiPattern(message) || matchesBillingTextPattern(message)) {
return { type: 'BillingError', retryable: true };
}
// === PERMANENT ERRORS (Non-retryable) ===
// Authentication (401) - bad API key won't fix itself
if (
message.includes('authentication') ||
message.includes('api key') ||
message.includes('401') ||
message.includes('authentication_error')
) {
return { type: 'AuthenticationError', retryable: false };
}
// Permission (403) - access won't be granted
if (message.includes('permission') || message.includes('forbidden') || message.includes('403')) {
return { type: 'PermissionError', retryable: false };
}
// === OUTPUT VALIDATION ERRORS (Retryable) ===
// Agent didn't produce expected deliverables - retry may succeed
// IMPORTANT: Must come BEFORE generic 'validation' check below
if (message.includes('failed output validation') || message.includes('output validation failed')) {
return { type: 'OutputValidationError', retryable: true };
}
// Invalid Request (400) - malformed request is permanent
// Note: Checked AFTER billing and AFTER output validation
if (message.includes('invalid_request_error') || message.includes('malformed') || message.includes('validation')) {
return { type: 'InvalidRequestError', retryable: false };
}
// Request Too Large (413) - won't fit no matter how many retries
if (message.includes('request_too_large') || message.includes('too large') || message.includes('413')) {
return { type: 'RequestTooLargeError', retryable: false };
}
// Configuration errors - missing files need manual fix
if (message.includes('enoent') || message.includes('no such file') || message.includes('cli not installed')) {
return { type: 'ConfigurationError', retryable: false };
}
// Execution limits - max turns/budget reached
if (
message.includes('max turns') ||
message.includes('budget') ||
message.includes('execution limit') ||
message.includes('error_max_turns') ||
message.includes('error_max_budget')
) {
return { type: 'ExecutionLimitError', retryable: false };
}
// Invalid target URL - bad URL format won't fix itself
if (
message.includes('invalid url') ||
message.includes('invalid target') ||
message.includes('malformed url') ||
message.includes('invalid uri')
) {
return { type: 'InvalidTargetError', retryable: false };
}
// === TRANSIENT ERRORS (Retryable) ===
// Rate limits (429), server errors (5xx), network issues
// Let Temporal retry with configured backoff
return { type: 'TransientError', retryable: true };
}
@@ -0,0 +1,67 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
/**
* Exploitation Checker Service
*
* Pure domain logic for determining whether exploitation should run.
* Reads queue file, parses JSON, returns decision.
*
* No Temporal dependencies - this is pure business logic.
*/
import type { ActivityLogger } from '../types/activity-logger.js';
import { isOk } from '../types/result.js';
import { type ExploitationDecision, type VulnType, validateQueueSafe } from './queue-validation.js';
/**
* Service for checking exploitation queue decisions.
*
* Determines whether an exploit agent should run based on
* the vulnerability analysis deliverables and queue files.
*/
export class ExploitationCheckerService {
/**
* Check if exploitation should run for a given vulnerability type.
*
* Reads the vulnerability queue file and returns the decision.
* This is pure domain logic - reads queue file, parses JSON, returns decision.
*
* @param vulnType - Type of vulnerability (injection, xss, auth, ssrf, authz)
* @param repoPath - Path to the repository containing deliverables
* @param logger - ActivityLogger for structured logging
* @returns ExploitationDecision indicating whether to exploit
* @throws PentestError if validation fails and is retryable
*/
async checkQueue(vulnType: VulnType, repoPath: string, logger: ActivityLogger): Promise<ExploitationDecision> {
const result = await validateQueueSafe(vulnType, repoPath);
if (isOk(result)) {
const decision = result.value;
logger.info(
`${vulnType}: ${decision.shouldExploit ? `${decision.vulnerabilityCount} vulnerabilities found` : 'no vulnerabilities, skipping exploitation'}`,
);
return decision;
}
// Validation failed - check if we should retry or skip
const error = result.error;
if (error.retryable) {
// Re-throw retryable errors so caller can handle retry
logger.warn(`${vulnType}: ${error.message} (retryable)`);
throw error;
}
// Non-retryable error - skip exploitation gracefully
logger.warn(`${vulnType}: ${error.message}, skipping exploitation`);
return {
shouldExploit: false,
shouldRetry: false,
vulnerabilityCount: 0,
vulnType,
};
}
}
+304
View File
@@ -0,0 +1,304 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
import { $ } from 'zx';
import type { ActivityLogger } from '../types/activity-logger.js';
import { ErrorCode } from '../types/errors.js';
import { PentestError } from './error-handling.js';
/**
* Check if a directory is a git repository.
* Returns true if the directory contains a .git folder or is inside a git repo.
*/
export async function isGitRepository(dir: string): Promise<boolean> {
try {
await $`cd ${dir} && git rev-parse --git-dir`.quiet();
return true;
} catch {
return false;
}
}
interface GitOperationResult {
success: boolean;
hadChanges?: boolean;
error?: Error;
}
/**
* Get list of changed files from git status --porcelain output
*/
async function getChangedFiles(sourceDir: string, operationDescription: string): Promise<string[]> {
const status = await executeGitCommandWithRetry(['git', 'status', '--porcelain'], sourceDir, operationDescription);
return status.stdout
.trim()
.split('\n')
.filter((line) => line.length > 0);
}
/**
* Log a summary of changed files with truncation for long lists
*/
function logChangeSummary(
changes: string[],
messageWithChanges: string,
messageWithoutChanges: string,
logger: ActivityLogger,
level: 'info' | 'warn' = 'info',
maxToShow: number = 5,
): void {
if (changes.length > 0) {
const msg = messageWithChanges.replace('{count}', String(changes.length));
const fileList = changes
.slice(0, maxToShow)
.map((c) => ` ${c}`)
.join(', ');
const suffix = changes.length > maxToShow ? ` ... and ${changes.length - maxToShow} more files` : '';
logger[level](`${msg} ${fileList}${suffix}`);
} else {
logger[level](messageWithoutChanges);
}
}
/**
* Convert unknown error to GitOperationResult
*/
function toErrorResult(error: unknown): GitOperationResult {
const errMsg = error instanceof Error ? error.message : String(error);
return {
success: false,
error: error instanceof Error ? error : new Error(errMsg),
};
}
// Serializes git operations to prevent index.lock conflicts during parallel agent execution
class GitSemaphore {
private queue: Array<() => void> = [];
private running: boolean = false;
async acquire(): Promise<void> {
return new Promise((resolve) => {
this.queue.push(resolve);
this.process();
});
}
release(): void {
this.running = false;
this.process();
}
private process(): void {
if (!this.running && this.queue.length > 0) {
this.running = true;
const resolve = this.queue.shift();
resolve?.();
}
}
}
const gitSemaphore = new GitSemaphore();
const GIT_LOCK_ERROR_PATTERNS = [
'index.lock',
'unable to lock',
'Another git process',
'fatal: Unable to create',
'fatal: index file',
];
function isGitLockError(errorMessage: string): boolean {
return GIT_LOCK_ERROR_PATTERNS.some((pattern) => errorMessage.includes(pattern));
}
// Retries git commands on lock conflicts with exponential backoff
export async function executeGitCommandWithRetry(
commandArgs: string[],
sourceDir: string,
description: string,
maxRetries: number = 5,
): Promise<{ stdout: string; stderr: string }> {
await gitSemaphore.acquire();
try {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const [cmd, ...args] = commandArgs;
const result = await $`cd ${sourceDir} && ${cmd} ${args}`;
return result;
} catch (error) {
const errMsg = error instanceof Error ? error.message : String(error);
if (isGitLockError(errMsg) && attempt < maxRetries) {
const delay = 2 ** (attempt - 1) * 1000;
// executeGitCommandWithRetry is also called outside activity context
// (e.g., from resume logic), so we use console.warn as a fallback here
console.warn(
`Git lock conflict during ${description} (attempt ${attempt}/${maxRetries}). Retrying in ${delay}ms...`,
);
await new Promise((resolve) => setTimeout(resolve, delay));
continue;
}
throw error;
}
}
throw new PentestError(
`Git command failed after ${maxRetries} retries`,
'filesystem',
true, // Retryable - transient git lock issues
{ maxRetries, description },
ErrorCode.GIT_CHECKPOINT_FAILED,
);
} finally {
gitSemaphore.release();
}
}
// Two-phase reset: hard reset (tracked files) + clean (untracked files)
export async function rollbackGitWorkspace(
sourceDir: string,
reason: string = 'retry preparation',
logger: ActivityLogger,
): Promise<GitOperationResult> {
// Skip git operations if not a git repository
if (!(await isGitRepository(sourceDir))) {
logger.info('Skipping git rollback (not a git repository)');
return { success: true };
}
logger.info(`Rolling back workspace for ${reason}`);
try {
const changes = await getChangedFiles(sourceDir, 'status check for rollback');
await executeGitCommandWithRetry(['git', 'reset', '--hard', 'HEAD'], sourceDir, 'hard reset for rollback');
await executeGitCommandWithRetry(['git', 'clean', '-fd'], sourceDir, 'cleaning untracked files for rollback');
logChangeSummary(
changes,
'Rollback completed - removed {count} contaminated changes:',
'Rollback completed - no changes to remove',
logger,
'info',
3,
);
return { success: true };
} catch (error) {
const errMsg = error instanceof Error ? error.message : String(error);
logger.error(`Rollback failed after retries: ${errMsg}`);
return {
success: false,
error: new PentestError(
`Git rollback failed: ${errMsg}`,
'filesystem',
false, // Non-retryable - rollback is best-effort cleanup
{ sourceDir, reason },
ErrorCode.GIT_ROLLBACK_FAILED,
),
};
}
}
// Creates checkpoint before each attempt. First attempt preserves workspace; retries clean it.
export async function createGitCheckpoint(
sourceDir: string,
description: string,
attempt: number,
logger: ActivityLogger,
): Promise<GitOperationResult> {
// Skip git operations if not a git repository
if (!(await isGitRepository(sourceDir))) {
logger.info('Skipping git checkpoint (not a git repository)');
return { success: true };
}
logger.info(`Creating checkpoint for ${description} (attempt ${attempt})`);
try {
// 1. On retries, clean workspace to prevent pollution from previous attempt
if (attempt > 1) {
const cleanResult = await rollbackGitWorkspace(sourceDir, `${description} (retry cleanup)`, logger);
if (!cleanResult.success) {
logger.warn(`Workspace cleanup failed, continuing anyway: ${cleanResult.error?.message}`);
}
}
// 2. Detect existing changes
const changes = await getChangedFiles(sourceDir, 'status check');
const hasChanges = changes.length > 0;
// 3. Stage and commit checkpoint
await executeGitCommandWithRetry(['git', 'add', '-A'], sourceDir, 'staging changes');
await executeGitCommandWithRetry(
['git', 'commit', '-m', `📍 Checkpoint: ${description} (attempt ${attempt})`, '--allow-empty'],
sourceDir,
'creating commit',
);
// 4. Log result
if (hasChanges) {
logger.info('Checkpoint created with uncommitted changes staged');
} else {
logger.info('Empty checkpoint created (no workspace changes)');
}
return { success: true };
} catch (error) {
const result = toErrorResult(error);
logger.warn(`Checkpoint creation failed after retries: ${result.error?.message}`);
return result;
}
}
export async function commitGitSuccess(
sourceDir: string,
description: string,
logger: ActivityLogger,
): Promise<GitOperationResult> {
// Skip git operations if not a git repository
if (!(await isGitRepository(sourceDir))) {
logger.info('Skipping git commit (not a git repository)');
return { success: true };
}
logger.info(`Committing successful results for ${description}`);
try {
const changes = await getChangedFiles(sourceDir, 'status check for success commit');
await executeGitCommandWithRetry(['git', 'add', '-A'], sourceDir, 'staging changes for success commit');
await executeGitCommandWithRetry(
['git', 'commit', '-m', `${description}: completed successfully`, '--allow-empty'],
sourceDir,
'creating success commit',
);
logChangeSummary(
changes,
'Success commit created with {count} file changes:',
'Empty success commit created (agent made no file changes)',
logger,
);
return { success: true };
} catch (error) {
const result = toErrorResult(error);
logger.warn(`Success commit failed after retries: ${result.error?.message}`);
return result;
}
}
/**
* Get current git commit hash.
* Returns null if not a git repository.
*/
export async function getGitCommitHash(sourceDir: string): Promise<string | null> {
if (!(await isGitRepository(sourceDir))) {
return null;
}
try {
const result = await $`cd ${sourceDir} && git rev-parse HEAD`;
return result.stdout.trim();
} catch {
return null;
}
}
+22
View File
@@ -0,0 +1,22 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
/**
* Services Module
*
* Exports DI container and service classes for Shannon agent execution.
* Services are pure domain logic with no Temporal dependencies.
*/
export type { AgentExecutionInput } from './agent-execution.js';
export { AgentExecutionService } from './agent-execution.js';
export { ConfigLoaderService } from './config-loader.js';
export type { ContainerDependencies } from './container.js';
export { Container, getOrCreateContainer, removeContainer } from './container.js';
export { ExploitationCheckerService } from './exploitation-checker.js';
export { loadPrompt } from './prompt-manager.js';
export { assembleFinalReport, injectModelIntoReport } from './reporting.js';
+489
View File
@@ -0,0 +1,489 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
/**
* Preflight Validation Service
*
* Runs cheap, fast checks before any agent execution begins.
* Catches configuration and credential problems early, saving
* time and API costs compared to failing mid-pipeline.
*
* Checks run sequentially, cheapest first:
* 1. Repository path exists and contains .git
* 2. Config file parses and validates (if provided)
* 3. Credentials validate via Claude Agent SDK query (API key, OAuth, Bedrock, Vertex AI, or router mode)
* 4. Target URL is reachable from the container (DNS + HTTP)
*/
import { lookup } from 'node:dns/promises';
import fs from 'node:fs/promises';
import http from 'node:http';
import https from 'node:https';
import type { SDKAssistantMessageError } from '@anthropic-ai/claude-agent-sdk';
import { query } from '@anthropic-ai/claude-agent-sdk';
import { resolveModel } from '../ai/models.js';
import { parseConfig } from '../config-parser.js';
import type { ActivityLogger } from '../types/activity-logger.js';
import { ErrorCode } from '../types/errors.js';
import { err, ok, type Result } from '../types/result.js';
import { isRetryableError, PentestError } from './error-handling.js';
const TARGET_URL_TIMEOUT_MS = 10_000;
function isLoopbackAddress(address: string): boolean {
return address === '127.0.0.1' || address === '::1' || address === '0.0.0.0';
}
// === Repository Validation ===
async function validateRepo(repoPath: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
logger.info('Checking repository path...', { repoPath });
// 1. Check repo directory exists
try {
const stats = await fs.stat(repoPath);
if (!stats.isDirectory()) {
return err(
new PentestError(
`Repository path is not a directory: ${repoPath}`,
'config',
false,
{ repoPath },
ErrorCode.REPO_NOT_FOUND,
),
);
}
} catch {
return err(
new PentestError(
`Repository path does not exist: ${repoPath}`,
'config',
false,
{ repoPath },
ErrorCode.REPO_NOT_FOUND,
),
);
}
// 2. Check .git directory exists
try {
const gitStats = await fs.stat(`${repoPath}/.git`);
if (!gitStats.isDirectory()) {
return err(
new PentestError(
`Not a git repository (no .git directory): ${repoPath}`,
'config',
false,
{ repoPath },
ErrorCode.REPO_NOT_FOUND,
),
);
}
} catch {
return err(
new PentestError(
`Not a git repository (no .git directory): ${repoPath}`,
'config',
false,
{ repoPath },
ErrorCode.REPO_NOT_FOUND,
),
);
}
logger.info('Repository path OK');
return ok(undefined);
}
// === Config Validation ===
async function validateConfig(configPath: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
logger.info('Validating configuration file...', { configPath });
try {
await parseConfig(configPath);
logger.info('Configuration file OK');
return ok(undefined);
} catch (error) {
if (error instanceof PentestError) {
return err(error);
}
const message = error instanceof Error ? error.message : String(error);
return err(
new PentestError(
`Configuration validation failed: ${message}`,
'config',
false,
{ configPath },
ErrorCode.CONFIG_VALIDATION_FAILED,
),
);
}
}
// === Credential Validation ===
/** Map SDK error type to a human-readable preflight PentestError. */
function classifySdkError(sdkError: SDKAssistantMessageError, authType: string): Result<void, PentestError> {
switch (sdkError) {
case 'authentication_failed':
return err(
new PentestError(
`Invalid ${authType}. Check your credentials in .env and try again.`,
'config',
false,
{ authType, sdkError },
ErrorCode.AUTH_FAILED,
),
);
case 'billing_error':
return err(
new PentestError(
`Anthropic account has a billing issue. Add credits or check your billing dashboard.`,
'billing',
true,
{ authType, sdkError },
ErrorCode.BILLING_ERROR,
),
);
case 'rate_limit':
return err(
new PentestError(
`Anthropic rate limit or spending cap reached. Wait a few minutes and try again.`,
'billing',
true,
{ authType, sdkError },
ErrorCode.BILLING_ERROR,
),
);
case 'server_error':
return err(
new PentestError(`Anthropic API is temporarily unavailable. Try again shortly.`, 'network', true, {
authType,
sdkError,
}),
);
default:
return err(
new PentestError(
`${authType} validation failed unexpectedly. Check your credentials in .env.`,
'config',
false,
{ authType, sdkError },
ErrorCode.AUTH_FAILED,
),
);
}
}
/** Validate credentials via a minimal Claude Agent SDK query. */
async function validateCredentials(logger: ActivityLogger): Promise<Result<void, PentestError>> {
// 1. Custom base URL — validate endpoint is reachable via SDK query
if (process.env.ANTHROPIC_BASE_URL) {
const baseUrl = process.env.ANTHROPIC_BASE_URL;
logger.info(`Validating custom base URL: ${baseUrl}`);
try {
for await (const message of query({ prompt: 'hi', options: { model: resolveModel('small'), maxTurns: 1 } })) {
if (message.type === 'assistant' && message.error) {
return classifySdkError(message.error, `custom endpoint (${baseUrl})`);
}
if (message.type === 'result') {
break;
}
}
logger.info('Custom base URL OK');
return ok(undefined);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
return err(
new PentestError(
`Custom base URL unreachable: ${baseUrl}${message}`,
'network',
false,
{ baseUrl },
ErrorCode.AUTH_FAILED,
),
);
}
}
// 2. Bedrock mode — validate required AWS credentials are present
if (process.env.CLAUDE_CODE_USE_BEDROCK === '1') {
const required = [
'AWS_REGION',
'AWS_BEARER_TOKEN_BEDROCK',
'ANTHROPIC_SMALL_MODEL',
'ANTHROPIC_MEDIUM_MODEL',
'ANTHROPIC_LARGE_MODEL',
];
const missing = required.filter((v) => !process.env[v]);
if (missing.length > 0) {
return err(
new PentestError(
`Bedrock mode requires the following env vars in .env: ${missing.join(', ')}`,
'config',
false,
{ missing },
ErrorCode.AUTH_FAILED,
),
);
}
logger.info('Bedrock credentials OK');
return ok(undefined);
}
// 3. Vertex AI mode — validate required GCP credentials are present
if (process.env.CLAUDE_CODE_USE_VERTEX === '1') {
const required = [
'CLOUD_ML_REGION',
'ANTHROPIC_VERTEX_PROJECT_ID',
'ANTHROPIC_SMALL_MODEL',
'ANTHROPIC_MEDIUM_MODEL',
'ANTHROPIC_LARGE_MODEL',
];
const missing = required.filter((v) => !process.env[v]);
if (missing.length > 0) {
return err(
new PentestError(
`Vertex AI mode requires the following env vars in .env: ${missing.join(', ')}`,
'config',
false,
{ missing },
ErrorCode.AUTH_FAILED,
),
);
}
// Validate service account credentials file is accessible
const credPath = process.env.GOOGLE_APPLICATION_CREDENTIALS;
if (!credPath) {
return err(
new PentestError(
'Vertex AI mode requires GOOGLE_APPLICATION_CREDENTIALS pointing to a service account key JSON file',
'config',
false,
{},
ErrorCode.AUTH_FAILED,
),
);
}
try {
await fs.access(credPath);
} catch {
return err(
new PentestError(
`Service account key file not found at: ${credPath}`,
'config',
false,
{ credPath },
ErrorCode.AUTH_FAILED,
),
);
}
logger.info('Vertex AI credentials OK');
return ok(undefined);
}
// 4. Check that at least one credential is present
if (!process.env.ANTHROPIC_API_KEY && !process.env.CLAUDE_CODE_OAUTH_TOKEN) {
return err(
new PentestError(
'No API credentials found. Set ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN in .env (or use CLAUDE_CODE_USE_BEDROCK=1 for AWS Bedrock, or CLAUDE_CODE_USE_VERTEX=1 for Google Vertex AI)',
'config',
false,
{},
ErrorCode.AUTH_FAILED,
),
);
}
// 5. Validate via SDK query
const authType = process.env.CLAUDE_CODE_OAUTH_TOKEN ? 'OAuth token' : 'API key';
logger.info(`Validating ${authType} via SDK...`);
try {
for await (const message of query({ prompt: 'hi', options: { model: resolveModel('small'), maxTurns: 1 } })) {
if (message.type === 'assistant' && message.error) {
return classifySdkError(message.error, authType);
}
if (message.type === 'result') {
break;
}
}
logger.info(`${authType} OK`);
return ok(undefined);
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
const retryable = isRetryableError(error instanceof Error ? error : new Error(message));
return err(
new PentestError(
retryable
? `Failed to reach Anthropic API. Check your network connection.`
: `${authType} validation failed: ${message}`,
retryable ? 'network' : 'config',
retryable,
{ authType },
retryable ? undefined : ErrorCode.AUTH_FAILED,
),
);
}
}
// === Target URL Validation ===
/** HTTP HEAD with TLS verification disabled — we check reachability, not certificate validity. */
function httpHead(url: string, timeoutMs: number): Promise<number> {
return new Promise((resolve, reject) => {
const parsed = new URL(url);
const isHttps = parsed.protocol === 'https:';
const transport = isHttps ? https : http;
const req = transport.request(
url,
{
method: 'HEAD',
timeout: timeoutMs,
...(isHttps && { rejectUnauthorized: false }),
},
(res) => {
res.resume();
resolve(res.statusCode ?? 0);
},
);
req.on('timeout', () => {
req.destroy();
reject(new Error(`Connection timed out after ${timeoutMs}ms`));
});
req.on('error', reject);
req.end();
});
}
/** Check that the target URL is reachable from inside the container. */
async function validateTargetUrl(targetUrl: string, logger: ActivityLogger): Promise<Result<void, PentestError>> {
logger.info('Checking target URL reachability...', { targetUrl });
// 1. Parse URL
let parsed: URL;
try {
parsed = new URL(targetUrl);
} catch {
return err(
new PentestError(
`Invalid target URL: ${targetUrl}`,
'config',
false,
{ targetUrl },
ErrorCode.TARGET_UNREACHABLE,
),
);
}
// 2. DNS lookup — detect loopback addresses early for a better hint
const hostname = parsed.hostname;
let resolvedAddress: string | undefined;
try {
const result = await lookup(hostname);
resolvedAddress = result.address;
} catch {
return err(
new PentestError(
`Target URL ${targetUrl} is not reachable. Verify the URL is correct and the site is up.`,
'network',
false,
{ targetUrl, hostname },
ErrorCode.TARGET_UNREACHABLE,
),
);
}
// 3. HTTP reachability check
try {
await httpHead(targetUrl, TARGET_URL_TIMEOUT_MS);
logger.info('Target URL OK');
return ok(undefined);
} catch (error) {
const isLoopback = isLoopbackAddress(resolvedAddress);
const detail = error instanceof Error ? error.message : String(error);
if (isLoopback) {
const suggestion = targetUrl.replace(hostname, 'host.docker.internal');
return err(
new PentestError(
`Target URL ${targetUrl} resolves to ${resolvedAddress} (loopback) and is not reachable. ` +
`For local services, use host.docker.internal instead of ${hostname} (e.g., ${suggestion})`,
'network',
false,
{ targetUrl, resolvedAddress, hostname },
ErrorCode.TARGET_UNREACHABLE,
),
);
}
return err(
new PentestError(
`Target URL ${targetUrl} is not reachable: ${detail}`,
'network',
false,
{ targetUrl, resolvedAddress },
ErrorCode.TARGET_UNREACHABLE,
),
);
}
}
// === Preflight Orchestrator ===
/**
* Run all preflight checks sequentially (cheapest first).
*
* 1. Repository path exists and contains .git
* 2. Config file parses and validates (if configPath provided)
* 3. Credentials validate (API key, OAuth, or router mode)
* 4. Target URL is reachable from the container
*
* Returns on first failure.
*/
export async function runPreflightChecks(
targetUrl: string,
repoPath: string,
configPath: string | undefined,
logger: ActivityLogger,
): Promise<Result<void, PentestError>> {
// 1. Repository check (free — filesystem only)
const repoResult = await validateRepo(repoPath, logger);
if (!repoResult.ok) {
return repoResult;
}
// 2. Config check (free — filesystem + CPU)
if (configPath) {
const configResult = await validateConfig(configPath, logger);
if (!configResult.ok) {
return configResult;
}
}
// 3. Credential check (cheap — 1 SDK round-trip)
const credResult = await validateCredentials(logger);
if (!credResult.ok) {
return credResult;
}
// 4. Target URL reachability check (cheap — 1 HTTP round-trip)
const urlResult = await validateTargetUrl(targetUrl, logger);
if (!urlResult.ok) {
return urlResult;
}
logger.info('All preflight checks passed');
return ok(undefined);
}
+267
View File
@@ -0,0 +1,267 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
import { fs, path } from 'zx';
import { PROMPTS_DIR } from '../paths.js';
import { PLAYWRIGHT_SESSION_MAPPING } from '../session-manager.js';
import type { ActivityLogger } from '../types/activity-logger.js';
import type { Authentication, DistributedConfig } from '../types/config.js';
import { handlePromptError, PentestError } from './error-handling.js';
interface PromptVariables {
webUrl: string;
repoPath: string;
PLAYWRIGHT_SESSION?: string;
}
interface IncludeReplacement {
placeholder: string;
content: string;
}
// Pure function: Build complete login instructions from config
async function buildLoginInstructions(authentication: Authentication, logger: ActivityLogger): Promise<string> {
try {
// 1. Load the login instructions template
const loginInstructionsPath = path.join(PROMPTS_DIR, 'shared', 'login-instructions.txt');
if (!(await fs.pathExists(loginInstructionsPath))) {
throw new PentestError('Login instructions template not found', 'filesystem', false, { loginInstructionsPath });
}
const fullTemplate = await fs.readFile(loginInstructionsPath, 'utf8');
const getSection = (content: string, sectionName: string): string => {
const regex = new RegExp(`<!-- BEGIN:${sectionName} -->([\\s\\S]*?)<!-- END:${sectionName} -->`, 'g');
const match = regex.exec(content);
return match?.[1]?.trim() ?? '';
};
// 2. Extract sections based on login type
const loginType = authentication.login_type?.toUpperCase();
let loginInstructions = '';
const commonSection = getSection(fullTemplate, 'COMMON');
const authSection = loginType ? getSection(fullTemplate, loginType) : ''; // FORM or SSO
const verificationSection = getSection(fullTemplate, 'VERIFICATION');
// 3. Assemble instructions from sections (fallback to full template if markers missing)
if (!commonSection && !authSection && !verificationSection) {
logger.warn('Section markers not found, using full login instructions template');
loginInstructions = fullTemplate;
} else {
loginInstructions = [commonSection, authSection, verificationSection].filter((section) => section).join('\n\n');
}
// 4. Interpolate login flow and credential placeholders
let userInstructions = (authentication.login_flow ?? []).join('\n');
if (authentication.credentials) {
if (authentication.credentials.username) {
userInstructions = userInstructions.replace(/\$username/g, authentication.credentials.username);
}
if (authentication.credentials.password) {
userInstructions = userInstructions.replace(/\$password/g, authentication.credentials.password);
}
if (authentication.credentials.totp_secret) {
userInstructions = userInstructions.replace(
/\$totp/g,
`generated TOTP code using secret "${authentication.credentials.totp_secret}"`,
);
}
}
loginInstructions = loginInstructions.replace(/{{user_instructions}}/g, userInstructions);
// 5. Replace TOTP secret placeholder if present in template
if (authentication.credentials?.totp_secret) {
loginInstructions = loginInstructions.replace(/{{totp_secret}}/g, authentication.credentials.totp_secret);
}
return loginInstructions;
} catch (error) {
if (error instanceof PentestError) {
throw error;
}
const errMsg = error instanceof Error ? error.message : String(error);
throw new PentestError(`Failed to build login instructions: ${errMsg}`, 'config', false, {
authentication,
originalError: errMsg,
});
}
}
// Pure function: Process @include() directives
async function processIncludes(content: string, baseDir: string): Promise<string> {
const includeRegex = /@include\(([^)]+)\)/g;
const resolvedBase = path.resolve(baseDir);
const replacements: IncludeReplacement[] = await Promise.all(
Array.from(content.matchAll(includeRegex)).map(async (match) => {
const rawPath = match[1] ?? '';
const includePath = path.resolve(baseDir, rawPath);
if (!includePath.startsWith(resolvedBase + path.sep) && includePath !== resolvedBase) {
throw new PentestError(`Path traversal detected in @include(): ${rawPath}`, 'prompt', false, {
includePath,
baseDir: resolvedBase,
});
}
const sharedContent = await fs.readFile(includePath, 'utf8');
return {
placeholder: match[0],
content: sharedContent,
};
}),
);
for (const replacement of replacements) {
content = content.replace(replacement.placeholder, replacement.content);
}
return content;
}
function buildAuthContext(config: DistributedConfig | null): string {
if (!config?.authentication) {
return 'No authentication configured - unauthenticated testing only';
}
const auth = config.authentication;
const lines = [
`- Login type: ${auth.login_type.toUpperCase()}`,
`- Username: ${auth.credentials.username}`,
`- Login URL: ${auth.login_url}`,
];
if (auth.credentials?.totp_secret) {
lines.push('- MFA: TOTP enabled');
}
return lines.join('\n');
}
// Pure function: Variable interpolation
async function interpolateVariables(
template: string,
variables: PromptVariables,
config: DistributedConfig | null = null,
logger: ActivityLogger,
): Promise<string> {
try {
if (!template || typeof template !== 'string') {
throw new PentestError('Template must be a non-empty string', 'validation', false, {
templateType: typeof template,
templateLength: template?.length,
});
}
if (!variables || !variables.webUrl || !variables.repoPath) {
throw new PentestError('Variables must include webUrl and repoPath', 'validation', false, {
variables: Object.keys(variables || {}),
});
}
let result = template
.replace(/{{WEB_URL}}/g, variables.webUrl)
.replace(/{{REPO_PATH}}/g, variables.repoPath)
.replace(/{{PLAYWRIGHT_SESSION}}/g, variables.PLAYWRIGHT_SESSION || 'agent1')
.replace(/{{AUTH_CONTEXT}}/g, buildAuthContext(config))
.replace(/{{DESCRIPTION}}/g, config?.description ? `Description: ${config.description}` : '');
if (config) {
// Handle rules section - if both are empty, use cleaner messaging
const hasAvoidRules = config.avoid && config.avoid.length > 0;
const hasFocusRules = config.focus && config.focus.length > 0;
if (!hasAvoidRules && !hasFocusRules) {
// Replace the entire rules section with a clean message
const cleanRulesSection = '<rules>\nNo specific rules or focus areas provided for this test.\n</rules>';
result = result.replace(/<rules>[\s\S]*?<\/rules>/g, cleanRulesSection);
} else {
const avoidRules = hasAvoidRules ? config.avoid?.map((r) => `- ${r.description}`).join('\n') : 'None';
const focusRules = hasFocusRules ? config.focus?.map((r) => `- ${r.description}`).join('\n') : 'None';
result = result.replace(/{{RULES_AVOID}}/g, avoidRules).replace(/{{RULES_FOCUS}}/g, focusRules);
}
// Extract and inject login instructions from config
if (config.authentication?.login_flow) {
const loginInstructions = await buildLoginInstructions(config.authentication, logger);
result = result.replace(/{{LOGIN_INSTRUCTIONS}}/g, loginInstructions);
} else {
result = result.replace(/{{LOGIN_INSTRUCTIONS}}/g, '');
}
} else {
// Replace the entire rules section with a clean message when no config provided
const cleanRulesSection = '<rules>\nNo specific rules or focus areas provided for this test.\n</rules>';
result = result.replace(/<rules>[\s\S]*?<\/rules>/g, cleanRulesSection);
result = result.replace(/{{LOGIN_INSTRUCTIONS}}/g, '');
}
// Validate that all placeholders have been replaced (excluding instructional text)
const remainingPlaceholders = result.match(/\{\{[^}]+\}\}/g);
if (remainingPlaceholders) {
logger.warn(`Found unresolved placeholders in prompt: ${remainingPlaceholders.join(', ')}`);
}
return result;
} catch (error) {
if (error instanceof PentestError) {
throw error;
}
const errMsg = error instanceof Error ? error.message : String(error);
throw new PentestError(`Variable interpolation failed: ${errMsg}`, 'prompt', false, { originalError: errMsg });
}
}
// Pure function: Load and interpolate prompt template
export async function loadPrompt(
promptName: string,
variables: PromptVariables,
config: DistributedConfig | null = null,
pipelineTestingMode: boolean = false,
logger: ActivityLogger,
): Promise<string> {
try {
// 1. Resolve prompt file path
const promptsDir = pipelineTestingMode ? path.join(PROMPTS_DIR, 'pipeline-testing') : PROMPTS_DIR;
const promptPath = path.join(promptsDir, `${promptName}.txt`);
if (pipelineTestingMode) {
logger.info(`Using pipeline testing prompt: ${promptPath}`);
}
if (!(await fs.pathExists(promptPath))) {
throw new PentestError(`Prompt file not found: ${promptPath}`, 'prompt', false, { promptName, promptPath });
}
// 2. Assign Playwright session based on agent name
const enhancedVariables: PromptVariables = { ...variables };
const session = PLAYWRIGHT_SESSION_MAPPING[promptName as keyof typeof PLAYWRIGHT_SESSION_MAPPING];
if (session) {
enhancedVariables.PLAYWRIGHT_SESSION = session;
logger.info(`Assigned ${promptName} -> ${enhancedVariables.PLAYWRIGHT_SESSION}`);
} else {
enhancedVariables.PLAYWRIGHT_SESSION = 'agent1';
logger.warn(`Unknown agent ${promptName}, using fallback -> ${enhancedVariables.PLAYWRIGHT_SESSION}`);
}
// 3. Read template file
let template = await fs.readFile(promptPath, 'utf8');
// 4. Process @include directives
template = await processIncludes(template, promptsDir);
// 5. Interpolate variables and return final prompt
return await interpolateVariables(template, enhancedVariables, config, logger);
} catch (error) {
if (error instanceof PentestError) {
throw error;
}
const promptError = handlePromptError(promptName, error as Error);
throw promptError.error;
}
}
@@ -0,0 +1,307 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
import { fs, path } from 'zx';
import type { ExploitationDecision, VulnType } from '../types/agents.js';
import { ErrorCode } from '../types/errors.js';
import { err, ok, type Result } from '../types/result.js';
import { asyncPipe } from '../utils/functional.js';
import { PentestError } from './error-handling.js';
export type { ExploitationDecision, VulnType } from '../types/agents.js';
interface VulnTypeConfigItem {
deliverable: string;
queue: string;
}
type VulnTypeConfig = Record<VulnType, VulnTypeConfigItem>;
type ErrorMessageResolver = string | ((existence: FileExistence) => string);
interface ValidationRule {
predicate: (existence: FileExistence) => boolean;
errorMessage: ErrorMessageResolver;
retryable: boolean;
}
interface FileExistence {
deliverableExists: boolean;
queueExists: boolean;
}
interface PathsBase {
vulnType: VulnType;
deliverable: string;
queue: string;
sourceDir: string;
}
interface PathsWithExistence extends PathsBase {
existence: FileExistence;
}
interface PathsWithQueue extends PathsWithExistence {
queueData: QueueData;
}
interface PathsWithError {
error: PentestError;
}
interface QueueData {
vulnerabilities: unknown[];
[key: string]: unknown;
}
interface QueueValidationResult {
valid: boolean;
data: QueueData | null;
error: string | null;
}
/**
* Result type for safe validation - explicit error handling.
*/
export type SafeValidationResult = Result<ExploitationDecision, PentestError>;
// Vulnerability type configuration as immutable data
const VULN_TYPE_CONFIG: VulnTypeConfig = Object.freeze({
injection: Object.freeze({
deliverable: 'injection_analysis_deliverable.md',
queue: 'injection_exploitation_queue.json',
}),
xss: Object.freeze({
deliverable: 'xss_analysis_deliverable.md',
queue: 'xss_exploitation_queue.json',
}),
auth: Object.freeze({
deliverable: 'auth_analysis_deliverable.md',
queue: 'auth_exploitation_queue.json',
}),
ssrf: Object.freeze({
deliverable: 'ssrf_analysis_deliverable.md',
queue: 'ssrf_exploitation_queue.json',
}),
authz: Object.freeze({
deliverable: 'authz_analysis_deliverable.md',
queue: 'authz_exploitation_queue.json',
}),
}) as VulnTypeConfig;
// Pure function to create validation rule
function createValidationRule(
predicate: (existence: FileExistence) => boolean,
errorMessage: ErrorMessageResolver,
retryable: boolean = true,
): ValidationRule {
return Object.freeze({ predicate, errorMessage, retryable });
}
// Symmetric deliverable rules: queue and deliverable must exist together (prevents partial analysis from triggering exploitation)
const fileExistenceRules: readonly ValidationRule[] = Object.freeze([
createValidationRule(
({ deliverableExists, queueExists }) => deliverableExists && queueExists,
getExistenceErrorMessage,
),
]);
// Generate appropriate error message based on which files are missing
function getExistenceErrorMessage(existence: FileExistence): string {
const { deliverableExists, queueExists } = existence;
if (!deliverableExists && !queueExists) {
return 'Analysis failed: Neither deliverable nor queue file exists. Analysis agent must create both files.';
}
if (!queueExists) {
return 'Analysis incomplete: Deliverable exists but queue file missing. Analysis agent must create both files.';
}
return 'Analysis incomplete: Queue exists but deliverable file missing. Analysis agent must create both files.';
}
// Pure function to create file paths
const createPaths = (vulnType: VulnType, sourceDir: string): PathsBase | PathsWithError => {
const config = VULN_TYPE_CONFIG[vulnType];
if (!config) {
return {
error: new PentestError(`Unknown vulnerability type: ${vulnType}`, 'validation', false, { vulnType }),
};
}
return Object.freeze({
vulnType,
deliverable: path.join(sourceDir, 'deliverables', config.deliverable),
queue: path.join(sourceDir, 'deliverables', config.queue),
sourceDir,
});
};
// Pure function to check file existence
const checkFileExistence = async (paths: PathsBase | PathsWithError): Promise<PathsWithExistence | PathsWithError> => {
if ('error' in paths) return paths;
const [deliverableExists, queueExists] = await Promise.all([
fs.pathExists(paths.deliverable),
fs.pathExists(paths.queue),
]);
return Object.freeze({
...paths,
existence: Object.freeze({ deliverableExists, queueExists }),
});
};
// Validates deliverable/queue symmetry - both must exist or neither
const validateExistenceRules = (
pathsWithExistence: PathsWithExistence | PathsWithError,
): PathsWithExistence | PathsWithError => {
if ('error' in pathsWithExistence) return pathsWithExistence;
const { existence, vulnType } = pathsWithExistence;
// Find the first rule that fails
const failedRule = fileExistenceRules.find((rule) => !rule.predicate(existence));
if (failedRule) {
const message =
typeof failedRule.errorMessage === 'function' ? failedRule.errorMessage(existence) : failedRule.errorMessage;
return {
error: new PentestError(
`${message} (${vulnType})`,
'validation',
failedRule.retryable,
{
vulnType,
deliverablePath: pathsWithExistence.deliverable,
queuePath: pathsWithExistence.queue,
existence,
},
ErrorCode.DELIVERABLE_NOT_FOUND,
),
};
}
return pathsWithExistence;
};
// Pure function to validate queue structure
const validateQueueStructure = (content: string): QueueValidationResult => {
try {
const parsed = JSON.parse(content) as unknown;
const isValid =
typeof parsed === 'object' &&
parsed !== null &&
'vulnerabilities' in parsed &&
Array.isArray((parsed as QueueData).vulnerabilities);
return Object.freeze({
valid: isValid,
data: isValid ? (parsed as QueueData) : null,
error: null,
});
} catch (parseError) {
return Object.freeze({
valid: false,
data: null,
error: parseError instanceof Error ? parseError.message : String(parseError),
});
}
};
// Queue parse failures are retryable - agent can fix malformed JSON on retry
const validateQueueContent = async (
pathsWithExistence: PathsWithExistence | PathsWithError,
): Promise<PathsWithQueue | PathsWithError> => {
if ('error' in pathsWithExistence) return pathsWithExistence;
try {
const queueContent = await fs.readFile(pathsWithExistence.queue, 'utf8');
const queueValidation = validateQueueStructure(queueContent);
if (!queueValidation.valid) {
// Rule 6: Both exist, queue invalid
return {
error: new PentestError(
queueValidation.error
? `Queue validation failed for ${pathsWithExistence.vulnType}: Invalid JSON structure. Analysis agent must fix queue format.`
: `Queue validation failed for ${pathsWithExistence.vulnType}: Missing or invalid 'vulnerabilities' array. Analysis agent must fix queue structure.`,
'validation',
true, // retryable
{
vulnType: pathsWithExistence.vulnType,
queuePath: pathsWithExistence.queue,
originalError: queueValidation.error,
queueStructure: queueValidation.data ? Object.keys(queueValidation.data) : [],
},
),
};
}
return Object.freeze({
...pathsWithExistence,
queueData: queueValidation.data as QueueData,
});
} catch (readError) {
return {
error: new PentestError(
`Failed to read queue file for ${pathsWithExistence.vulnType}: ${readError instanceof Error ? readError.message : String(readError)}`,
'filesystem',
false,
{
vulnType: pathsWithExistence.vulnType,
queuePath: pathsWithExistence.queue,
originalError: readError instanceof Error ? readError.message : String(readError),
},
),
};
}
};
// Final decision: skip if queue says no vulns, proceed if vulns found, error otherwise
const determineExploitationDecision = (validatedData: PathsWithQueue | PathsWithError): ExploitationDecision => {
if ('error' in validatedData) {
throw validatedData.error;
}
const hasVulnerabilities = validatedData.queueData.vulnerabilities.length > 0;
// Rule 4: Both exist, queue valid and populated
// Rule 5: Both exist, queue valid but empty
return Object.freeze({
shouldExploit: hasVulnerabilities,
shouldRetry: false,
vulnerabilityCount: validatedData.queueData.vulnerabilities.length,
vulnType: validatedData.vulnType,
});
};
// Main functional validation pipeline
export async function validateQueueAndDeliverable(
vulnType: VulnType,
sourceDir: string,
): Promise<ExploitationDecision> {
return asyncPipe<ExploitationDecision>(
createPaths(vulnType, sourceDir),
checkFileExistence,
validateExistenceRules,
validateQueueContent,
determineExploitationDecision,
);
}
/**
* Safely validate queue and deliverable files.
* Returns Result<ExploitationDecision, PentestError> for explicit error handling.
*/
export async function validateQueueSafe(vulnType: VulnType, sourceDir: string): Promise<SafeValidationResult> {
try {
const result = await validateQueueAndDeliverable(vulnType, sourceDir);
return ok(result);
} catch (error) {
return err(error as PentestError);
}
}
+154
View File
@@ -0,0 +1,154 @@
// Copyright (C) 2025 Keygraph, Inc.
//
// This program is free software: you can redistribute it and/or modify
// it under the terms of the GNU Affero General Public License version 3
// as published by the Free Software Foundation.
import { fs, path } from 'zx';
import type { ActivityLogger } from '../types/activity-logger.js';
import { ErrorCode } from '../types/errors.js';
import { PentestError } from './error-handling.js';
interface DeliverableFile {
name: string;
path: string;
required: boolean;
}
// Pure function: Assemble final report from specialist deliverables
export async function assembleFinalReport(sourceDir: string, logger: ActivityLogger): Promise<string> {
const deliverableFiles: DeliverableFile[] = [
{ name: 'Injection', path: 'injection_exploitation_evidence.md', required: false },
{ name: 'XSS', path: 'xss_exploitation_evidence.md', required: false },
{ name: 'Authentication', path: 'auth_exploitation_evidence.md', required: false },
{ name: 'SSRF', path: 'ssrf_exploitation_evidence.md', required: false },
{ name: 'Authorization', path: 'authz_exploitation_evidence.md', required: false },
];
const sections: string[] = [];
for (const file of deliverableFiles) {
const filePath = path.join(sourceDir, 'deliverables', file.path);
try {
if (await fs.pathExists(filePath)) {
const content = await fs.readFile(filePath, 'utf8');
sections.push(content);
logger.info(`Added ${file.name} findings`);
} else if (file.required) {
throw new PentestError(
`Required deliverable file not found: ${file.path}`,
'filesystem',
false,
{ deliverableFile: file.path, sourceDir },
ErrorCode.DELIVERABLE_NOT_FOUND,
);
} else {
logger.info(`No ${file.name} deliverable found`);
}
} catch (error) {
if (file.required) {
throw error;
}
const err = error as Error;
logger.warn(`Could not read ${file.path}: ${err.message}`);
}
}
const finalContent = sections.join('\n\n');
const deliverablesDir = path.join(sourceDir, 'deliverables');
const finalReportPath = path.join(deliverablesDir, 'comprehensive_security_assessment_report.md');
try {
// Ensure deliverables directory exists
await fs.ensureDir(deliverablesDir);
await fs.writeFile(finalReportPath, finalContent);
logger.info(`Final report assembled at ${finalReportPath}`);
} catch (error) {
const err = error as Error;
throw new PentestError(`Failed to write final report: ${err.message}`, 'filesystem', false, {
finalReportPath,
originalError: err.message,
});
}
return finalContent;
}
/**
* Inject model information into the final security report.
* Reads session.json to get the model(s) used, then injects a "Model:" line
* into the Executive Summary section of the report.
*/
export async function injectModelIntoReport(
repoPath: string,
outputPath: string,
logger: ActivityLogger,
): Promise<void> {
// 1. Read session.json to get model information
const sessionJsonPath = path.join(outputPath, 'session.json');
if (!(await fs.pathExists(sessionJsonPath))) {
logger.warn('session.json not found, skipping model injection');
return;
}
interface SessionData {
metrics: {
agents: Record<string, { model?: string }>;
};
}
const sessionData: SessionData = await fs.readJson(sessionJsonPath);
// 2. Extract unique models from all agents
const models = new Set<string>();
for (const agent of Object.values(sessionData.metrics.agents)) {
if (agent.model) {
models.add(agent.model);
}
}
if (models.size === 0) {
logger.warn('No model information found in session.json');
return;
}
const modelStr = Array.from(models).join(', ');
logger.info(`Injecting model info into report: ${modelStr}`);
// 3. Read the final report
const reportPath = path.join(repoPath, 'deliverables', 'comprehensive_security_assessment_report.md');
if (!(await fs.pathExists(reportPath))) {
logger.warn('Final report not found, skipping model injection');
return;
}
let reportContent = await fs.readFile(reportPath, 'utf8');
// 4. Find and inject model line after "Assessment Date" in Executive Summary
// Pattern: "- Assessment Date: <date>" followed by a newline
const assessmentDatePattern = /^(- Assessment Date: .+)$/m;
const match = reportContent.match(assessmentDatePattern);
if (match) {
// Inject model line after Assessment Date
const modelLine = `- Model: ${modelStr}`;
reportContent = reportContent.replace(assessmentDatePattern, `$1\n${modelLine}`);
logger.info('Model info injected into Executive Summary');
} else {
// If no Assessment Date line found, try to add after Executive Summary header
const execSummaryPattern = /^## Executive Summary$/m;
if (reportContent.match(execSummaryPattern)) {
// Add model as first item in Executive Summary
reportContent = reportContent.replace(execSummaryPattern, `## Executive Summary\n- Model: ${modelStr}`);
logger.info('Model info added to Executive Summary header');
} else {
logger.warn('Could not find Executive Summary section');
return;
}
}
// 5. Write modified report back
await fs.writeFile(reportPath, reportContent);
}