feat: use structured outputs for vuln agent exploitation queues (#267)

* feat: add structured outputs for vuln agent exploitation queues

Use Claude Agent SDK's native outputFormat to get schema-validated JSON
queue data from vulnerability analysis agents instead of relying on
save-deliverable tool calls for queue files.

- Add Zod schemas for all 5 vuln types (injection, xss, auth, ssrf, authz)
- Thread outputFormat through SDK call chain (executor → message handlers)
- Write structured_output to disk as queue JSON before validation
- Handle error_max_structured_output_retries as retryable failure
- Update vuln prompts to use structured output for queues
- Keep save-deliverable for markdown deliverables (unchanged)

* fix: correct structured output schema conversion for Claude Agent SDK

Use draft-07 target for z.toJSONSchema() instead of the default
draft-2020-12, which the SDK's AJV validator doesn't support. Update
pipeline-testing prompts to use structured output instead of raw JSON
responses.

* refactor: remove save-deliverable references for queues in vuln prompts

Queues are now captured via structured outputs, so vuln agents no longer
need to use save-deliverable for queue JSON. Removes references to
"structured response/output" phrasing and aligns all prompts to use
consistent "exploitation queue" terminology.

* refactor: remove queue support from save-deliverable

Queues are now produced via structured outputs, so save-deliverable no
longer needs queue-related code. Removes queue enum values, filename
mappings, JSON validation, and updates all prompt tool descriptions to
match the simplified CLI interface.

* fix: instruct vuln agents to save deliverable before exploitation queue

The structured output tool terminates the agent session when called.
Agents were calling it before saving their deliverable markdown,
causing output validation failures and unnecessary retries.

* refactor: remove explicit exploitation queue output instructions from vuln prompts

The Claude Agent SDK automatically captures structured output on the
last turn when outputFormat is set. Prompts explicitly telling agents
to produce the queue caused them to call StructuredOutput mid-session,
conflicting with the SDK mechanism and silently dropping the output.

Removed exploitation_queue_requirements sections and queue references
from conclusion triggers. Added note that the queue is captured
automatically. Updated Your Output to point to the deliverable markdown.
This commit is contained in:
ezl-keygraph
2026-04-02 01:12:00 +05:30
committed by GitHub
parent 6a0c8ce710
commit 2a433f090f
28 changed files with 273 additions and 236 deletions
+16 -2
View File
@@ -21,7 +21,9 @@
* No Temporal dependencies - pure domain logic.
*/
import { fs, path } from 'zx';
import { type ClaudePromptResult, runClaudePrompt, validateAgentOutput } from '../ai/claude-executor.js';
import { getOutputFormat, getQueueFilename } from '../ai/queue-schemas.js';
import type { AuditSession } from '../audit/index.js';
import { AGENTS } from '../session-manager.js';
import type { ActivityLogger } from '../types/activity-logger.js';
@@ -134,6 +136,7 @@ export class AgentExecutionService {
await auditSession.startAgent(agentName, prompt, attemptNumber);
// 5. Execute agent
const outputFormat = getOutputFormat(agentName);
const result: ClaudePromptResult = await runClaudePrompt(
prompt,
repoPath,
@@ -143,6 +146,7 @@ export class AgentExecutionService {
auditSession,
logger,
AGENTS[agentName].modelTier,
outputFormat,
);
// 6. Spending cap check - defense-in-depth
@@ -176,7 +180,17 @@ export class AgentExecutionService {
});
}
// 8. Validate output
// 8. Write structured output to disk (vuln agents only)
const queueFilename = getQueueFilename(agentName);
if (result.structuredOutput !== undefined && queueFilename) {
const deliverablesDir = path.join(repoPath, 'deliverables');
await fs.ensureDir(deliverablesDir);
const queuePath = path.join(deliverablesDir, queueFilename);
await fs.writeFile(queuePath, JSON.stringify(result.structuredOutput, null, 2), 'utf8');
logger.info(`Wrote structured output queue to ${queueFilename}`);
}
// 9. Validate output
const validationPassed = await validateAgentOutput(result, agentName, repoPath, logger);
if (!validationPassed) {
return this.failAgent(agentName, repoPath, auditSession, logger, {
@@ -191,7 +205,7 @@ export class AgentExecutionService {
});
}
// 9. Success - commit deliverables, then capture checkpoint hash
// 10. Success - commit deliverables, then capture checkpoint hash
await commitGitSuccess(repoPath, agentName, logger);
const commitHash = await getGitCommitHash(repoPath);
+3 -3
View File
@@ -114,12 +114,12 @@ function getExistenceErrorMessage(existence: FileExistence): string {
const { deliverableExists, queueExists } = existence;
if (!deliverableExists && !queueExists) {
return 'Analysis failed: Neither deliverable nor queue file exists. Analysis agent must create both files.';
return 'Analysis failed: Neither deliverable nor queue file exists. Both are required.';
}
if (!queueExists) {
return 'Analysis incomplete: Deliverable exists but queue file missing. Analysis agent must create both files.';
return 'Analysis incomplete: Deliverable exists but queue file missing. Both are required.';
}
return 'Analysis incomplete: Queue exists but deliverable file missing. Analysis agent must create both files.';
return 'Analysis incomplete: Queue exists but deliverable file missing. Both are required.';
}
// Pure function to create file paths