Skip to content

AI Agent

Every request that passes through Ghost is stored with full detail — headers, bodies, timing, TLS info, browser interactions. The AI agent can read all of this data, reason about it, and take action. Instead of manually scrolling through hundreds of flows looking for bugs or security issues, you describe what you want in plain English, and the agent does the work: searching traffic, analyzing patterns, generating test code, writing bug reports, or running security scans.

The agent is not a simple chatbot. It follows a structured plan → execute → reflect → terminate cycle. It creates a step-by-step plan for your request, executes each step using specialized tools (searching traffic, calling external scanners, writing files), reflects on what it found after each step, and stops when the plan is complete. You can watch the plan progress in real-time and redirect the agent mid-run if it goes off track.

  1. Open the Chat Panel — click the chat icon in the toolbar or press Ctrl+4
  2. Type a question or instruction in natural language
  3. The agent creates a plan, then starts executing it — you’ll see tool calls and results streaming in real-time
  4. The Plan Panel above the chat shows live progress: which step is active, which are done, which are pending
  • “Find all API errors in this session and generate a bug report”
  • “Generate Playwright tests for the checkout flow”
  • “Compare this session against the previous one — did anything regress?”
  • “Generate API documentation from the captured traffic”
  • “Run a load test against the search endpoint with 50 virtual users”
  • “Highlight all product prices on the page” (if the browser extension is connected)
  • “Scan all endpoints for SQL injection”
  • “Analyze the JWT tokens in the traffic for weaknesses”
  • “Check all endpoints for missing security headers”
  • “Run a full penetration test against api.example.com”
  • “Find any hardcoded secrets in the JavaScript bundles”

What this diagram shows — the agent’s iteration loop:

  1. Dynamic system prompt — Every LLM call includes a system prompt that Ghost builds dynamically. It contains the current session’s traffic statistics (total flows, error rate, top hosts, method distribution), which tools are available (different in QA vs Security mode), whether the browser extension is connected, whether a mobile device is connected, and which addons are active. This means the agent always knows the current state of your session.

  2. LLM call — The prompt and conversation history are sent to whichever LLM provider you’ve configured. The agent supports three providers (detailed below). The LLM responds either with text (displayed to you) or with tool calls (executed by Ghost).

  3. Tool routing — Ghost doesn’t send all tools to the LLM every time. Instead, it looks at what step of the plan the agent is currently on and sends only the 15-25 tools relevant to that step’s category (reconnaissance, analysis, active testing, exploitation, or reporting). This saves tokens and keeps the LLM focused. If no plan exists yet, all tools are sent so the agent can decide what to plan.

  4. Parallel execution — When the LLM requests multiple tool calls in one response, read-only tools (like search_traffic, get_flow, list_sessions) run concurrently — up to 4 at a time. Tools that modify state (like tag_flows, replay_request, fs_write) run one at a time to avoid conflicts.

  5. Output compression — Each tool’s output is capped at 16,000 characters (roughly 4,000 tokens). For tools that return large results (like run_nuclei or search_traffic), Ghost intelligently compresses the output — keeping critical/high-severity findings and truncating lower-priority content rather than blindly cutting off at a character limit.

  6. Context management — The full conversation (system prompt + all messages + all tool results) is tracked for token usage. When it approaches the provider’s context window limit, older messages are mechanically summarized — tool names and key findings are preserved, but raw outputs are dropped. This is a deterministic process (no LLM call for summarization), so it’s fast and predictable.

  7. Reflection — After completing each plan step, the agent receives a reflection prompt asking it to assess: Was the step complete? Is the finding well-evidenced? Did anything change the plan? Should the next step be adjusted? This prevents the agent from rushing through steps without thinking about what it found.

  8. Termination check — Six signals can stop the agent (detailed below). If none fire, the loop continues with the next LLM call.

  9. Mid-run steering — While the agent is running, you can type a message in the chat input (which shows an amber border during active runs). Your message is injected into the conversation between iterations. The agent has a buffer of up to 5 steering messages — if you type more than 5 before the agent processes them, the oldest are dropped.

Ghost supports three LLM providers. You configure which one to use in Settings → AI or in ~/.ghost/config.toml. API keys are encrypted at rest with AES-256-GCM — they’re never stored in plaintext on disk.

ProviderDefault ModelAvailable ModelsContext WindowWhat It’s Best For
AnthropicClaude Sonnet 4.6Opus 4.6, Sonnet 4.6, Sonnet 4.5, Haiku 4.5200,000 tokensPrimary recommended provider. Ghost uses Anthropic’s prompt caching (CacheControl: ephemeral on the system prompt and last tool definition) for significant cost savings on repeated calls. Opus 4.6 is the most capable for complex analysis.
OpenAIGPT-4oGPT-5.4, o3, o3 Pro, o4-mini, GPT-4.1/mini/nano, GPT-4o/mini128,000 tokensAlternative cloud provider. Uses the openai-go SDK with streaming via ChatCompletionAccumulator for assembling tool calls from streamed chunks. o3/o3 Pro excel at reasoning-heavy tasks.
Ollamallama3.2Llama 4, 3.3, 3.2, Qwen 3/2.5, DeepSeek R1/Coder V2, Mistral, Mixtral, Phi-4, Gemma 232,000 tokensFully offline/local operation. Connects to Ollama’s OpenAI-compatible API at http://localhost:11434/v1. No data leaves your machine. Any Ollama model works — these are pre-configured options.

All providers share the same retry behavior: up to 3 retries on rate limit errors (HTTP 429), with wait time parsed from the error response. Max output tokens per LLM response: 8,192.

The agent’s first action on any task must be creating a plan via the create_plan tool. This is enforced — if the agent tries to produce a text response without creating a plan first, Ghost injects a nudge: “You MUST create a plan before proceeding.” (up to 2 nudges before giving up).

Plans have 1–15 steps. Each step has a description, a category (recon, analysis, active_test, exploit, or report), and a status (pending, in_progress, completed, skipped, or failed). The plan is visible in the UI’s Plan Panel and looks like this:

<current_plan progress="2/5" phase="analysis">
Goal: Test authentication security on api.example.com
Scope: api.example.com — auth endpoints, JWT, IDOR
[1] Map API endpoints ............... DONE — Found 12 endpoints
[2] Analyze auth patterns ........... IN PROGRESS
[3] Test JWT security ............... PENDING
[4] Test rate limiting .............. PENDING
[5] Generate security report ........ PENDING
</current_plan>

The agent can revise the plan mid-execution (up to 5 revisions). If it discovers something unexpected in step 2 that requires additional testing, it calls revise_plan to add new steps before continuing. After completing each step, the agent calls complete_step with a result summary (capped at 500 characters).

After the agent completes a plan step, Ghost injects a reflection prompt asking:

  1. Completeness — Did you test everything relevant to this step?
  2. Quality — Is your finding well-evidenced? Would a senior reviewer accept it?
  3. Discovery — Did you find anything that changes your plan?
  4. Next step — Is the next planned step still the right action?

When all steps are complete, a final reflection is injected that asks the agent to walk through the entire plan, review all findings, assess confidence (low/medium/high), and identify anything critical that was missed. If the agent finds gaps during final reflection, it can call revise_plan to add more steps.

Six signals can stop the agent, checked in this priority order every iteration:

SignalConditionWhat Happens
Plan complete + reflectedAll plan steps are marked completed or skipped, AND final reflection is doneAgent stops immediately — work is complete.
Plan completed + text-onlyPlan status is “completed” and the agent produced a text response with zero tool callsAgent stops — it’s writing its final summary.
Loop detectedThe same tool with the same input (checked by FNV-1a hash) was called 3 or more times within the last 10 tool call recordsAgent is asked to wrap up and write a report. This prevents infinite loops where the agent keeps retrying the same failing action.
Diminishing returnsAfter iteration 8: the current step is still “in progress” and the last 6 tool calls are all the same tool nameAgent is asked to wrap up. This catches cases where the agent is stuck calling the same tool repeatedly without advancing.
Budget reservationIteration reaches maxIterations - 3 (iteration 22 of 25)Agent is asked to write its final report with the remaining 3 iterations. This ensures the agent always has room to produce a summary rather than being cut off mid-analysis.
User stopUser clicks the stop button in the UIAgent wraps up the current step and writes a summary.

The hard cap is 25 iterations (LLM calls) per run. If none of the above signals trigger and the agent reaches 25, the run ends with an error.

When an agent is asked to “wrap up” (signals 3-6), it receives a termination prompt:

“You are being asked to wrap up. Call complete_step for any in-progress step, write a final summary of your work, findings, and recommendations. Do NOT start new tool calls.”

Ghost registers different tools depending on the active mode (QA or Security) and what’s connected (extension, device, Frida). Here are all tools that are actually registered — tools exist in the codebase but are commented out of registration are not listed.

Core Tools (Both Modes — Always Available)

Section titled “Core Tools (Both Modes — Always Available)”
ToolWhat It Does
search_trafficSearch flows using GQL queries with pagination (max 50 results per page, supports offset for paging).
get_flowGet full detail of a single flow — request/response headers, status, timing, tags, annotations.
get_flow_bodyGet the request or response body of a flow (up to 16 KB). Supports grep_pattern to search within large bodies instead of returning everything.
find_endpointsDiscover all unique endpoints (method + host + path combinations) across captured traffic.
get_traffic_statsAggregate statistics — total flows, error rate, top hosts, method distribution, average response time.
tag_flowsAdd or remove tags on flows (e.g., tag flows as “bug”, “suspicious”, “auth”).
replay_requestRe-send a captured request exactly as it was originally captured, or with modifications.
annotate_flowAdd a text annotation/note to a flow — useful for documenting findings directly on the relevant traffic.
list_sessionsList all sessions (up to 100).
list_journeysList recorded browser journeys (up to 100).
get_journey_stepsGet the recorded steps of a journey — interactions, flows, selectors, element details (up to 500 steps).
journey_exportExport a journey as deterministic test code (Cypress UI/API, Playwright UI/API, k6, Postman, cURL, HAR). Uses the same reliable exporter as the Ghost UI — stable selectors, no hallucination. Writes to workspace and returns the full path. Preferred over generate_test for journey-based tests.
list_ws_messagesQuery captured WebSocket frames — filter by direction, message type, content.
list_console_errorsList browser console errors captured by the extension.
list_navigationsList page navigation events captured by the extension.
list_storage_changesList localStorage/sessionStorage/cookie changes captured by the extension.
analyze_requirementsCross-reference uploaded requirements documents against captured traffic to assess test coverage.
create_planCreate a structured execution plan (1-15 steps). Single-step plans are allowed for focused tasks like “export journey as Cypress UI”.
revise_planModify the current plan — add, remove, or reorder steps (max 5 revisions).
complete_stepMark a plan step as completed with a result summary.
thinkPrivate reasoning scratchpad — the agent writes its thoughts without displaying them as chat text. Renders as a purple “Thinking” indicator with bouncing dots in the UI.
present_optionsPresent 2-6 interactive choices to the user with a question, labeled options (with descriptions), and a free-text input. Pauses the agent loop until the user responds. See Interactive Choices below.

Browser Extension Tools (When Extension Connected)

Section titled “Browser Extension Tools (When Extension Connected)”
ToolWhat It Does
highlight_elementHighlight a DOM element in the browser with a colored outline — useful for pointing out elements to the user.
show_toastShow a notification toast in the browser page.
annotate_elementAdd a persistent label/badge to a DOM element in the browser.
browser_read_pageRead the full DOM content of the current browser page.
browser_query_allQuery the page DOM using CSS selectors — returns matching elements with their attributes and text content.
browser_clickClick a DOM element by CSS selector.
browser_fillFill a form field by CSS selector with a value.
browser_screenshotTake a screenshot of the page or a specific element.
browser_injectInject arbitrary JavaScript into the page context.

Proxy Injection (When Script Injector Available)

Section titled “Proxy Injection (When Script Injector Available)”
ToolWhat It Does
proxy_inject_scriptInject JavaScript into HTML responses matching a URL pattern — works on ALL devices including mobile (unlike browser extension tools which only work in the browser). Scripts can use window.__ghost.observe(selector, callback) for SPA-aware DOM watching, window.__ghost.fetch(url) for cross-origin requests (active modes only), and window.__ghost.analyze(data, prompt) for AI analysis (active modes only).
ToolWhat It Does
generate_testGenerate test code from captured traffic. Supports Playwright, Cypress, pytest, k6, and other frameworks.
generate_bug_reportCreate a structured bug report with traffic evidence — includes reproduction steps, request/response details, and screenshots.
detect_regressionCompare two sessions to find regressions — new errors, changed response structures, performance degradation.
export_asExport a flow as cURL, fetch, Python requests, Go http, or other formats.
generate_api_docsGenerate API documentation from captured traffic — endpoint inventory, request/response schemas, example payloads.
generate_mock_serverGenerate a mock server from captured traffic — replay recorded responses for offline testing.
generate_test_scenariosGenerate test scenario descriptions from captured traffic patterns.
generate_session_reportGenerate a comprehensive session summary report.
fuzz_endpointSystematic API fuzzing — send variations of a request to discover edge cases and errors.
test_formDiscover form fields on a page and test them with various inputs.
record_journeyStart or stop recording a user journey with correlated browser interactions and HTTP flows. Requires extension.
replay_journeyReplay a recorded journey’s HTTP flows and compare responses with the originals (SSE streaming).

Performance Testing Tools (Require External Tools Installed)

Section titled “Performance Testing Tools (Require External Tools Installed)”
ToolWhat It Does
run_k6Run a k6 load test against an endpoint. Capped at 100 virtual users and 5-minute duration.
run_heyRun an HTTP benchmark with hey. Capped at 1,000 requests, 50 concurrency, 60-second duration.

Mobile Inspector Tools (When Device Connected)

Section titled “Mobile Inspector Tools (When Device Connected)”
ToolWhat It Does
get_device_screenCapture the current device screen as a screenshot (resized to max 400px width, JPEG quality 60 to save tokens).
get_element_treeGet the UI element hierarchy (max depth 10 levels).
find_elementsSearch for elements by text or accessibility properties (max 20 results).
get_element_selectorsGenerate automation selectors for an element — Appium, Espresso, XCUITest, and Maestro formats.
correlate_element_trafficFind which API calls were triggered by interacting with a UI element (checks a 5-second time window, max 200 flows).
tap_deviceTap an element on the device screen. Rate-limited to 10 taps per 5 seconds to prevent accidental rapid-fire.
type_deviceType text into the currently focused field (max 500 characters).

TestRail Integration Tools (When TestRail Configured)

Section titled “TestRail Integration Tools (When TestRail Configured)”
ToolWhat It Does
testrail_list_projectsList TestRail projects.
testrail_get_casesGet test cases from a TestRail project.
testrail_push_resultsPush test results to TestRail.
testrail_suggest_casesSuggest which TestRail test cases are relevant to the current traffic.
ToolWhat It Does
list_findingsList security findings detected by Ghost’s passive security interceptor — these are findings that were automatically detected during traffic capture, before the agent even starts.
send_http_requestSend a custom HTTP request with full control over method, URL, headers, and body. SSRF-protected: Ghost validates that the resolved IP is not private/loopback before connecting.
get_page_resourcesMap all resources loaded by a page — JavaScript, CSS, images, fonts. Useful for finding third-party scripts that might contain secrets.
request_approvalRequest user approval before performing a destructive or data-modifying operation. Required in active-safe mode for write operations.
attack_requestLaunch an automated payload attack against a captured request (detailed below). Requires the Attacker engine. 5-minute timeout.
list_wordlistsList available payload wordlists for use with attack_request.

Frida Tools (When Frida Connected — Security Mode Only)

Section titled “Frida Tools (When Frida Connected — Security Mode Only)”
ToolWhat It Does
frida_checkVerify Frida connection is alive. Must be called before other Frida tools.
frida_list_appsList installed apps on the connected device.
frida_bypass_sslBypass SSL certificate pinning to intercept HTTPS traffic from apps that would otherwise refuse the proxy certificate.
frida_root_bypassBypass root/jailbreak detection so the app runs normally on a rooted device.
frida_traceHook functions in the running app and log their arguments and return values. 60-second timeout.
frida_injectInject a custom Frida script into the running app for deep runtime inspection. 3-minute timeout.

External Scanner Tools (Security Mode — Require Installation)

Section titled “External Scanner Tools (Security Mode — Require Installation)”
ToolTimeoutWhat It Does
run_nuclei5 min 10 secRun Nuclei vulnerability scanner templates against target endpoints.
run_dalfox5 min 10 secRun DalFox XSS scanner.
run_ffuf5 min 10 secFuzz paths, parameters, or headers with ffuf.
run_sqlmap5 min 10 secRun sqlmap SQL injection testing against a captured flow.
run_trufflehog5 min 10 secScan for hardcoded secrets in captured JavaScript and responses.
run_katana5 min 10 secCrawl and discover endpoints beyond what was captured.
run_semgrep5 min 10 secStatic analysis of captured JavaScript for client-side vulnerabilities.
run_nmap5 min 10 secPort scan and service detection.
run_ssl_scan5 min 10 secTLS/SSL configuration analysis.
run_hydra5 min 10 secPassword brute-force testing.

External scanner stdout is capped at 2 MB, stderr at 64 KB. If a scanner isn’t installed on the system, its run_* tool simply won’t be registered.

ToolWhat It Does
fs_writeWrite a file to the agent’s workspace directory (max 1 MB per file). Used for saving evidence, PoC scripts, reports.
fs_readRead a file from the workspace (max 1 MB, 500 lines).
fs_listList files in the workspace directory.

The attack_request tool launches Ghost’s built-in request attacker engine — a payload fuzzer inspired by Burp Suite’s Intruder. It takes a captured flow as a base request and systematically replaces parts of it with payloads from wordlists.

  1. Pick a base request — The agent selects a captured flow to use as the template
  2. Define insertion points — Where payloads should be injected: header, query_param, body_json, body_form, body_raw, cookie, path_segment, or method
  3. Choose payloads — From built-in wordlists or custom values
  4. Send a baseline — The original request is sent first to establish a “normal” response (status code, body length, response time)
  5. Launch the attack — Payloads are injected and sent in parallel, each response compared against the baseline
ModeHow Payloads Are CombinedUse Case
SniperOne insertion point at a time, others keep original values. If you have 2 points and 100 payloads, that’s 200 requests.Testing each parameter individually for a specific vulnerability class.
Battering RamSame payload in all insertion points simultaneously. 100 payloads = 100 requests regardless of point count.Testing if the same input causes issues across multiple parameters.
PitchforkPayload lists are walked in parallel — payload 1 from list A goes with payload 1 from list B. Length = shortest list.Paired data like username/password lists.
Cluster BombCartesian product — every combination of every payload across all points. 100 payloads × 2 points = 10,000 requests.Exhaustive testing of all parameter combinations.

Ghost embeds these payload files directly in the binary (no external files needed):

WordlistPurpose
sqli-genericSQL injection payloads (UNION, error-based, time-based, boolean blind)
xss-reflectedCross-site scripting payloads for reflection testing
command-injectionOS command injection payloads (;id, |whoami, $(command))
path-traversalDirectory traversal payloads (../../../etc/passwd variants)
ssrfServer-side request forgery payloads (internal IPs, cloud metadata URLs)
sstiServer-side template injection payloads ({{7*7}}, ${7*7})
nosql-injectionNoSQL injection payloads for MongoDB, CouchDB etc.
open-redirectOpen redirect payloads for testing URL redirect parameters
auth-bypass-headersAuthentication bypass headers (X-Original-URL, X-Forwarded-For)
http-methodsHTTP method tampering (PUT, DELETE, PATCH, TRACE, etc.)

Additionally, api-endpoints and common-paths wordlists are available for path fuzzing with run_ffuf, and a dynamic numeric-ids list generates numbers 1–1000 for IDOR testing.

ParameterDefaultMaximum
Max requests per attack5002,000
Parallel threads510
Delay between requests50 ms
Top interesting results kept20
Response body read limit1 MB

The attacker automatically flags results that differ significantly from the baseline:

  • Status code changed — baseline returned 200, this payload returned 500 (could indicate injection)
  • Response length changed by >20% — significantly different body could mean data leak or error
  • Response time 3× longer — could indicate time-based SQL injection or resource exhaustion

You can also define custom match rules — regex patterns on the response body, specific status codes, header values, or response time thresholds.

The attacker engine validates every outbound request at dial time. Before connecting, it resolves the hostname to an IP address and blocks connections to loopback (127.0.0.1), private (10.x.x.x, 192.168.x.x, 172.16-31.x.x), link-local, or unspecified addresses. This prevents the agent from being tricked into attacking internal services.

In Security mode, the agent operates under one of three scan modes that control what tools it can use:

ModeWhat’s AllowedWhat Requires ApprovalWhat’s Forbidden
Passive (default)All traffic analysis tools, fs_read/fs_write/fs_list, run_trufflehog (with --no-verification), run_semgrep, read-only browser tools, proxy_inject_script (DOM annotation only — no __ghost.fetch or __ghost.analyze)Nothing — no approval mechanism neededAll outbound requests (replay_request, send_http_request), all active scanners (run_nuclei, run_katana, run_ffuf, run_dalfox, run_sqlmap), all interactive browser tools (browser_click, browser_fill), all Frida tools
Active-SafeEverything in passive, plus: GET requests via send_http_request/replay_request, scanners with safe defaults, attack_request, frida_trace (read-only)POST/PUT/PATCH/DELETE that create/modify/delete data, actions that could trigger lockouts, frida_bypass_ssl/frida_root_bypass/frida_inject, browser_injectBrute force attacks, destructive payloads
Active-FullEverything in passive and active-safe, plus: all scanners at full power, all Frida tools, browser_injectOnly data-modifying requests (POST/PUT/PATCH/DELETE that change state on the target)Nothing beyond the approval gate

The scan mode is set server-side — the agent cannot escalate its own permissions.

In Security mode, the agent tracks an engagement state that follows the PTES (Penetration Testing Execution Standard) methodology with automatic phase progression:

Phases: traffic_analysispassive_detectionactive_scanningexploitationreportingdone

The agent auto-advances phases based on tool usage — for example, after 3+ reconnaissance tool calls, it advances from traffic_analysis to passive_detection. The state also tracks discovered endpoints (up to 200), confirmed findings (up to 100), active injection rules (up to 50), and evidence file paths.

In QA mode, a simpler phase progression is used: qa_reconqa_functionalqa_edge_casesqa_errorsqa_performanceqa_reporting.

The agent’s system prompt is not static — it’s assembled dynamically on every run based on context:

QA Mode prompt includes:

  • Agent identity and available tool list (conditional on extension/device connectivity)
  • Current session stats (flow count, error rate, top hosts, method distribution)
  • Active addons
  • Strategic directives (observe first, don’t assume, use annotations)
  • Planning protocol (create plan first, revise when needed)
  • Data parsing protocol (locale-aware number parsing for Turkish formats like “1.250,50 TL”)
  • Injection rules (if extension connected — color palette, URL patterns, isolation)
  • Inspector context (if device connected — screen capture → element tree → selector → correlate strategy)
  • GQL reference (search syntax for search_traffic)
  • Few-shot examples (bug finding, test generation, visual annotation)

Security Mode prompt includes:

  • Authorization context (pre-authorized engagement — no disclaimers, no permission-asking)
  • Pentester identity (methodical, persistent, tries 5-10 approaches before moving on)
  • Operator rules (execute don’t advise, chain findings, prove impact, respect scope, document evidence)
  • Session context with target hosts and scan mode
  • Scan mode rules (exactly what’s allowed/forbidden/needs approval)
  • Tool strategy (phase-ordered tool usage with output chaining rules)
  • External tool availability (only shows tools actually installed on the system)
  • Vulnerability taxonomy — API vulns (SQLi, RCE, auth bypass, SSRF, IDOR, etc.) and web-specific vulns (DOM XSS, open redirect, clickjacking, etc.)
  • Mobile vulnerability taxonomy (if device/Frida connected — cert pinning, insecure storage, root detection, etc.)
  • Workflow protocol (evidence file naming: findings/VULN-{NNN}-{type}.md, PoC scripts: poc/{type}-exploit.py, final report: report.md)

Agent responses stream to the frontend via Server-Sent Events (SSE), not WebSocket. The endpoint is POST /api/v1/agent/chat. POST was chosen over GET (which the browser’s native EventSource API requires) because the chat message needs to be sent in the request body.

SSE event types sent to the frontend:

EventWhat It Contains
chunkA text token from the LLM — these arrive one at a time and build up the agent’s message as you watch.
tool_callThe agent is calling a tool — includes tool name and input parameters.
tool_resultA tool finished executing — includes the tool’s output.
plan_createdThe agent created its plan — includes all steps.
step_startedA plan step is now in progress.
step_completedA plan step finished — includes the result summary.
plan_revisedThe agent revised its plan — includes updated steps.
plan_completedAll plan steps are done.
optionsThe agent is presenting interactive choices — renders an OptionsPanel with clickable option cards and a free-text input. The agent loop pauses until the user responds.
steerA user steering message was injected.
metricsRun metrics emitted at the end.
errorAn error occurred.
doneThe agent run is finished.

Sometimes the agent needs your input before it can proceed — for example, “Which test framework should I use?” or “Which finding should I investigate first?” Instead of requiring you to type a response, the agent uses present_options to show a structured choice panel.

  1. The agent calls present_options with a question and 2-6 labeled options (each with an optional description)
  2. The agent loop pauses — it emits metrics with termination reason "waiting for user choice" and exits cleanly
  3. The frontend renders an OptionsPanel: a 2-column card grid with clickable option buttons, plus a free-text input if you want to type something custom
  4. When you click an option (or type a custom response and press Enter), the selected value is sent as a new chat message
  5. A new agent run starts with your choice in the conversation history, and the agent continues from where it left off
  • Pending state: While the present_options tool is executing, a spinning indicator with “Preparing options…” appears
  • Active state: Option cards are arranged in a 2-column grid. Each card shows a bold label and an optional description underneath. Cards have hover effects and focus rings for keyboard navigation.
  • Answered state: After you select an option, the entire panel fades to 50% opacity. The selected option shows a cyan check icon and highlighted border. The free-text input disappears.
  • Run Summary suppression: The Run Summary card is hidden when the termination reason is "waiting for user choice" — the agent isn’t actually done, it’s just pausing for input.
  • OptionsPanel (chat/options-panel.tsx) — Card grid with option buttons and free-text input
  • ToolCallCard (chat/tool-call-card.tsx) — Special-cases present_options to show a spinner while pending and nothing when done (the OptionsPanel handles the UI)
  • ThinkIndicator (inside tool-call-card.tsx) — Special-cases the think tool to show a purple card with bouncing dots animation while pending, then “Reasoned about the approach” when done

After each run completes, the agent emits a metrics event with:

MetricWhat It Measures
DurationTotal wall-clock time from start to finish.
IterationsNumber of LLM calls made (max 25).
Tool callsTotal number of tool invocations across all iterations.
ReflectionsNumber of step reflection points where the agent paused to assess progress.
Termination reasonWhy the agent stopped — plan complete, loop detected, budget reservation, user stop, waiting for user choice, etc.
Cost estimateEstimated LLM cost based on input + output token counts and the provider’s pricing.

These appear in the Run Summary card at the bottom of the agent’s response in the chat panel. The summary is hidden when the termination reason is "waiting for user choice" — the agent isn’t done, it’s waiting for input via present_options.

Tool Call Cards show contextual one-line summaries next to the tool label — for example, create_plan shows the first 50 characters of the goal, send_http_request shows GET https://api.example.com, and journey_export shows the export format. Special tool rendering:

  • think → Purple “Thinking” card with bouncing dots animation (no accordion)
  • present_options → Spinner while pending, hidden when done (OptionsPanel handles display)
  • fs_write → FileCard showing the relative path, line count, and byte size
  • journey_export → FileCard showing the export path, format, and size
  • Conversations are stored in SQLite — each conversation belongs to a session. Messages (user, assistant, tool calls, tool results) are persisted immediately as they happen, not just at the end.
  • Multiple conversations per session — start a new conversation for a different topic without losing the old one.
  • Conversation list in the chat panel sidebar — click to switch between conversations.
  • Message IDs are ULIDs — time-ordered, so conversations are naturally sorted chronologically.
  • Delete conversations to clean up — removes the conversation and all its messages from the database.

When a tool returns more than 16,000 characters, Ghost compresses it intelligently based on the tool type:

ToolCompression Strategy
run_nucleiKeep all CRITICAL and HIGH findings, cap MEDIUM at 20, LOW at 10.
run_sqlmapKeep injection confirmations, cap databases at 20, tables at 30, data rows at 20.
run_katanaPrioritize URLs with parameters (cap 100), then plain paths (cap 50).
run_ffufPrioritize non-200 responses (cap 50), then 200 responses (cap 50).
search_trafficParse JSON, cap the results array at 50 items, re-serialize.
get_flow_bodySimple truncation at the character limit.
get_device_screenRemove the base64 screenshot data, keep metadata.
All othersTruncate with a summary indicating total length and how much was kept.