Skip to content

Browser Extension

Ghost’s browser extension adds three capabilities on top of proxy traffic interception: observing what users do (Capture), driving browser actions (Action), and overlaying intelligence on pages (Inject). It also powers journey recording by injecting X-Ghost-Interaction headers via Chrome’s declarativeNetRequest API for interaction-flow correlation.

The content script runs 7 capture subsystems that observe DOM events and send structured data to Ghost via the service worker:

  • CSS selector path for the clicked element (generated automatically)
  • Element text, type, and coordinates
  • Page URL, page title, and timestamp
  • Sent as capture.interaction with type click
  • Fires on the change event (when the user leaves a field after modifying it)
  • Captures: field selector, element type (input/textarea/select)
  • Input metadata: field name, type, label (resolved via <label for>, parent <label>, aria-label, or placeholder), validation attributes (required, pattern, min, max, minlength, maxlength)
  • Privacy: Only captures field structure and metadata — never the actual input value
  • Sent as capture.interaction with type change
  • Fires on the submit event (form submission)
  • Captures: form selector, form action URL
  • Sent as capture.interaction with type submit
  • Detects URL changes via three mechanisms: popstate event, monkey-patched History.pushState and History.replaceState, and a MutationObserver on the <title> element
  • Captures: from URL → to URL, page title
  • This covers both traditional page loads and SPA route changes (React Router, Next.js, etc.)
  • Sent as capture.navigation
  • Fires on mouseover and focusin events on interactive elements
  • Captures: element selector, ARIA labels
  • Used for test coverage gap detection — Ghost can identify interactive elements that were never clicked
  • Sent as capture.interaction with type hover or focus
  • Captures error events (JavaScript exceptions) and unhandledrejection events (unhandled Promise rejections)
  • Includes error message, stack trace, page URL, and timestamp
  • Sent as capture.console_error
  • Monitors localStorage and sessionStorage via the storage event plus periodic diffing (every 5 seconds)
  • Detects key additions, removals, and value changes
  • Useful for tracking auth token lifecycle, session state, and feature flag changes
  • Sent as capture.storage_change

Note: Screenshots are NOT a passive capture event — they’re an explicit action (see Action Layer below). The 7 subsystems above are all passive observers that run automatically.

  • Click/change events: Per-selector throttle of 200ms — repeated interactions on the same element within 200ms are deduplicated
  • Hover/focus events: Per-element throttle of 500ms, plus a 50ms rate limit in the service worker to prevent flooding
  • Throttle map: Capped at 1,000 entries. When the cap is reached, the entire map is cleared (prevents unbounded memory growth on pages with many interactive elements)

Ghost can drive browser actions via the extension. 15 action types:

ActionDescription
clickClick element by selector (mousedown → mouseup → click chain)
fillSet input/textarea value + trigger input/change/blur events
checkCheck/uncheck checkbox, toggle radio button
selectSelect dropdown option by value or visible text
scroll_toScroll element into view
navigateNavigate to URL
waitWait for condition (element visible/hidden, URL match, text appears, network idle)
read_pageGet page info: URL, title, all forms, all interactive elements with selectors
read_formGet form fields: name, type, label, validation rules, current values
screenshotCapture visible tab
get_textGet element text content
get_attributeGet element attribute value
check_stateCheck element state (visible, disabled, checked)
execute_stepsExecute ordered step sequence with waits between steps
query_allQuery all matching elements

All actions include visibility checks and scroll-into-view before interaction. The content script enforces a 30-second timeout per action. The Go backend uses a 60-second timeout for the full round-trip (which includes WebSocket message delivery and network latency on top of the action execution).

The AI agent has 6 registered browser tools that invoke extension actions:

Agent ToolExtension ActionWhat It Does
browser_clickclickClicks an element by CSS selector
browser_fillfillTypes text into an input field
browser_read_pageread_pageGets page URL, title, all forms, and all interactive elements with selectors
browser_query_allquery_allQueries all elements matching a CSS selector
browser_screenshotscreenshotCaptures the visible tab
browser_inject(inject command)Injects UI overlays onto the page (highlights, annotations, toasts)

Several extension actions exist that do NOT have dedicated agent tools — the agent achieves those results through its registered tools instead. For example, read_page covers what read_form does, query_all covers get_text/get_attribute/check_state, and the agent calls tools individually rather than using execute_steps.

Ghost can overlay UI elements onto the web page through the content script:

CommandDescription
highlightColored border + label on an element
annotateLabel text near an element (e.g., API endpoint triggered)
toastNotification overlay (slow response, schema change, error)
test_resultGreen check / red X on tested elements
panelMulti-section floating overlay
clearRemove all injected overlays

All injected UI uses Shadow DOM to prevent style leakage between Ghost’s overlays and the page’s CSS. ResizeObserver and MutationObserver keep overlays positioned correctly as the page changes.

Overlays are automatically cleaned up when SPA navigation is detected (URL change without page reload).

The extension connects to Ghost via WebSocket:

ws://localhost:5565/ws/extension?token={bearer_token}
  • Host: localhost (default)
  • Port: 5565 (Ghost API port)
  • Token: Bearer token from Ghost (stored in extension’s chrome.storage)

Configure via the extension popup or via the extension setup panel in Ghost.

  • Exponential backoff: 1s initial, 1.5x multiplier, 30s maximum
  • 24-second keepalive alarm (prevents Chrome MV3 from suspending the service worker)

Extension → Ghost messages:

  • ext.hello — initial handshake (announces the extension and its capabilities)
  • ext.tab_switch — the user switched to a different browser tab
  • ext.ping — keepalive ping (sent every ~24 seconds to prevent Chrome from suspending the service worker)
  • capture.interaction — a DOM event was captured (click, change, submit, hover, focus)
  • capture.console_error — a JavaScript error or unhandled rejection was observed
  • capture.navigation — a page navigation was detected (URL change)
  • capture.storage_change — a localStorage/sessionStorage mutation was detected
  • action.result — the result of an executed action (success/failure + data)

Ghost → Extension messages:

  • ghost.welcome — handshake response with session ID
  • action.request — request to execute a browser action (one of the 15 action types)
  • inject.command — request to inject a UI overlay (one of the 6 inject commands)
  • journey.recording_start — start injecting X-Ghost-Interaction headers via declarativeNetRequest for journey recording
  • journey.recording_stop — stop injecting headers and remove the dynamic rule

The popup provides:

  • Connection status — connected/disconnected with host:port
  • Toggle capture per tab
  • Quick actions: screenshot, read page, clear overlays
  • Recent interactions — last 10 captured events
  • Settings link — configure connection
ShortcutAction
Alt+GOpen extension popup
Alt+Shift+CToggle capture for current tab
Alt+Shift+STake screenshot

When the extension captures an interaction (click, change, submit), Ghost’s correlation engine matches it with proxy flows captured around the same time. The engine queries flows within a ±500ms window of the interaction timestamp (limited to 50 candidate flows per window) and scores each match using four weighted criteria:

CriterionWeightHow It Works
Host match0.4Whether the flow’s request host matches the page’s domain
Timing proximity0.3Linear decay — flows closer in time to the interaction score higher
Path similarity0.2How closely the flow’s URL path relates to the page URL
HTTP method heuristic0.1POST/PUT/DELETE score higher than GET (user actions more likely to mutate data)

Flows scoring below a 0.3 minimum confidence threshold are discarded. The remaining matches power the interaction indicator in the flow inspector — showing which UI action triggered each API call.

During journey recording, the extension uses a more deterministic correlation mechanism alongside the heuristic scoring above. It injects an X-Ghost-Interaction header into every outgoing request via Chrome’s declarativeNetRequest API (rule ID 9999). When the user performs an interaction (click, change, submit — not hover or focus), the header value is updated with a unique ID. The proxy reads and strips this header, storing the ID on the flow. The backend recorder uses this ID to create "action" steps (correlated interaction + flow) rather than relying purely on timing-based heuristics.

The extension requires the declarativeNetRequest permission to create dynamic header injection rules. The rule targets xmlhttprequest (covers both XHR and fetch), main_frame, and sub_frame resource types. The correlation window is 600ms — after which the header resets to an ambient value so unrelated requests aren’t misattributed. Stale rules from previous sessions or crashes are cleaned up automatically on service worker startup.