Skip to content

Testing Guide

Ghost’s testing infrastructure focuses on the Go backend, which is where the critical logic lives — the MITM proxy, SQLite database, API handlers, WebSocket hub, AI agent, and security interceptor. The frontend and extension rely on TypeScript type checking and linting rather than runtime tests.


LayerTest infrastructureQuality gate
Go backend31 test files, ~10,500 lines, 357+ test functions, 1 benchmarkgo test -race -cover
React frontendNo test runner installedtsc --noEmit (TypeScript check) + ESLint
Chrome extensionNo test filestsc --noEmit (TypeScript check)

The Go tests cover 9 of the 16 backend packages, with the heaviest coverage on the agent system (~4,300 lines of tests), proxy (~1,700 lines), and store (~1,170 lines).

PackageTest filesTest linesWhat’s tested
internal/agent11~4,300Tool execution, plan lifecycle, termination signals, context management, parallel execution, reflections, QA tools, browser tools, attacker engine
internal/proxy8~1,740Proxy server integration, MITM certificate issuance, noise detection, CA generation, protobuf decoding, interceptor pipeline, flow model, path normalization
internal/api2~1,180API server integration (full request/response testing), WebSocket hub (broadcast, client lifecycle)
internal/store1~1,170SQLite store — flow CRUD, session management, FTS search, migrations, pagination, purging
internal/inspector4~830Traffic-UI correlation, selector generation, WebView detection, noise domain filtering
internal/extension2~670Extension WebSocket hub (connect/disconnect, message routing), interaction correlation
internal/addon1~325Addon engine — VM lifecycle, handler execution, ghost.* API, action dispatching
internal/device1~190Android ADB interaction parsing
internal/config1~140AES-256-GCM encryption/decryption round-trip, nonce uniqueness, tamper detection

These packages currently have no test files:

PackageWhyRisk
internal/certinstallPlatform-specific OS commands — hard to test portablyLow (small, well-defined behavior)
internal/sysproxyPlatform-specific OS commandsLow (same)
internal/fridaRequires Frida runtime + connected deviceMedium (subprocess management)
internal/sectoolsRequires external tool binariesLow (thin wrapper)
internal/testrailRequires TestRail serverLow (HTTP client)
cmd/ghostEntry point wiring — tested indirectly by integration testsLow

Terminal window
# Run all tests with race detector, coverage, and no caching
make test # go test -race -cover -count=1 ./...
# Same with verbose output (shows each test function name)
make test-v # go test -race -cover -count=1 -v ./...
# Same with -short flag (exists for future use — no tests currently
# check testing.Short(), so this runs identically to make test)
make test-short # go test -race -short -count=1 ./...
# Run tests for a specific package
go test -race ./internal/proxy/...
go test -race ./internal/store/...
go test -race ./internal/agent/...

The -count=1 flag disables Go’s test result caching. This is important because Ghost’s tests create SQLite databases, bind network ports, and start HTTP servers — cached results from a previous run might not reflect the current state.

Terminal window
make cover

This runs all tests with coverage profiling enabled, generates a coverage.out file, and opens an interactive HTML report (coverage.html) in your browser. The report shows which lines of code were executed during tests and which were not, colored green and red respectively.

All test commands include -race by default. Go’s race detector instruments the compiled binary to detect concurrent access to shared variables without proper synchronization. This is critical for Ghost because many subsystems are accessed concurrently:

  • The WebSocket hub broadcasts events to multiple clients simultaneously
  • The proxy handles many connections in parallel, each potentially modifying shared state
  • The SQLite store uses a single-writer model but concurrent readers
  • Background goroutines (WAL checkpoint, auto-purge, device discovery) access shared data structures

The race detector adds approximately 10x runtime overhead, which is why it’s only used during development and testing, not in production builds.


Ghost’s tests follow consistent patterns within each package. There is no shared testutil package — each package defines its own local helpers in _test.go files. This keeps test dependencies explicit and avoids coupling between packages.

Store tests create isolated database instances so each test starts with a clean slate:

func newTestStore(t *testing.T) *SQLiteStore {
dir := t.TempDir()
store, err := OpenSQLite(context.Background(), filepath.Join(dir, "test.db"), slog.Default())
require.NoError(t, err)
t.Cleanup(func() { store.Close() })
return store
}

Key points:

  • Uses t.TempDir() which creates a temporary directory that’s automatically cleaned up when the test finishes
  • Each test gets its own SQLite file, so tests can run in parallel without interference
  • t.Cleanup ensures the database is properly closed even if the test panics
  • Helper functions like newTestSession(t, store, name) and newTestFlow(id, sessionID) create test fixtures with minimal boilerplate

API tests create a complete server environment with a real HTTP server, database, and all subsystems wired together:

func setupTestEnv(t *testing.T) *testEnv {
// Creates: httptest.Server, API Server, :memory: SQLite store,
// generated CA, config.NewTestManager, Frida manager, hub
// ...
}

The testEnv struct provides:

  • httptest.Server — a real HTTP server on a random port
  • SQLite store (:memory: — in-memory database, no disk I/O)
  • Generated CA certificate and key
  • config.NewTestManager(cfg) — config manager that skips disk persistence
  • Auth token for authenticated requests
  • Started WebSocket hub with automatic cleanup

SSRF override: Test environments replace the SSRF-safe replay transport with http.DefaultTransport. This is necessary because httptest.NewServer binds to 127.0.0.1, which the SSRF protection would normally block (it rejects requests to private/loopback addresses). This override is safe because it only applies to test code.

Proxy tests start a real proxy server and upstream targets:

func startTestProxy(t *testing.T, pipeline []Interceptor) (*Server, int) {
// Creates real proxy on random port with generated CA
// Returns the server and its port number
}

Tests create httptest.NewServer instances as upstream targets, configure the proxy to intercept traffic to those targets, and verify that the proxy correctly handles, modifies, and forwards requests. A collectingInterceptor records all flows for assertions:

type collectingInterceptor struct {
mu sync.Mutex
flows []*Flow
}

The mutex is necessary because the proxy processes requests concurrently, and the test interceptor must be thread-safe.

Agent tests use a comprehensive mockStore that implements the full store.Store interface with in-memory maps. This avoids any database dependency and makes tests fast and deterministic:

type mockStore struct {
flows map[string]*proxy.Flow
sessions map[string]*store.Session
interactions []store.Interaction
// ... more fields for each store interface
}

Tool tests verify that each agent tool produces the correct output given specific inputs, without calling any real LLM. The mock store is pre-populated with test flows, sessions, and other data.

Config tests use config.NewTestManager(cfg) which creates a manager with noSave: true. This prevents tests from writing to the filesystem, avoiding interference between tests and ensuring no test artifacts are left behind.

Both the main hub and extension hub are tested using httptest.Server + gorilla/websocket:

func setupHubTestServer() (*httptest.Server, *Hub) {
hub := NewHub(slog.Default())
go hub.Run()
server := httptest.NewServer(http.HandlerFunc(hub.HandleWebSocket))
return server, hub
}

Tests connect WebSocket clients, send messages, and verify that broadcasts reach all connected clients. The hub is started in a goroutine with t.Cleanup ensuring it’s stopped.


Ghost includes one benchmark for URL path normalization — the function that replaces dynamic segments (UUIDs, numeric IDs, hex hashes) with {id} for session comparison:

Terminal window
go test -bench=. ./internal/proxy/...

The BenchmarkNormalizePath benchmark tests the performance of this function since it runs on every flow during session comparison and needs to be fast.


Terminal window
make lint # golangci-lint run ./...
make vet # go vet ./...
make fmt # gofmt -s -w + goimports -w

golangci-lint runs with its built-in default configuration (no .golangci.yml config file exists in the repository). The defaults include checks for:

  • Unused code and variables
  • Error handling issues (unchecked errors, empty error handling)
  • Style violations (formatting, naming conventions)
  • Security concerns (potential vulnerabilities)
  • Shadowed variables
  • Inefficient code patterns

go vet is Go’s built-in static analysis tool. It catches issues like incorrect Printf format strings, unreachable code, and suspicious constructs that compile but are almost certainly bugs.

gofmt -s formats code with the -s (simplify) flag, which removes unnecessary type conversions and composite literal types. goimports additionally organizes import statements into groups (standard library, external, internal) and removes unused imports.


Terminal window
cd frontend
npx tsc --noEmit # Type check without producing output files
npm run build # Runs tsc -b first, then Vite build
npm run lint # ESLint

The frontend has no runtime test runner (no Vitest, Jest, or Playwright). The quality gate is zero TypeScript errors from tsc --noEmit. This catches type mismatches, missing properties, incorrect function signatures, and other static issues at compile time.

ESLint provides additional checks for React hooks rules, React refresh compatibility, and general code quality.


Before declaring any change complete, verify that the entire codebase compiles cleanly:

Terminal window
# Go: all packages compile without errors
go build ./...
# Go: static analysis passes
go vet ./...
# Frontend: TypeScript type check passes
cd frontend && npx tsc --noEmit

These three commands catch the majority of issues. If all three pass, the change is very likely to work correctly at runtime.


Ghost maintains strict coding standards that are enforced during code review. These rules exist because these exact bugs appeared repeatedly in quality gates — the goal is to prevent them from being written in the first place.

RuleWhy it matters
Always check json.Unmarshal errors_ = json.Unmarshal() silently produces zero-value structs, leading to mysterious nil pointer dereferences later
Always check url.Parse errorsparsed, _ := url.Parse() returns a nil URL, and the next line accessing parsed.Host panics
Always handle io.ReadAll errorsWhen the body matters (not just draining), an error means the data is incomplete
SSRF protection on user-provided URLsAny handler that sends HTTP to a user-provided URL needs isPrivateHost() validation and scheme checking to prevent internal network scanning
io.LimitReader on every request bodyNamed constant for each limit. Prevents denial-of-service via oversized payloads
Binary content detection for JSON responsesAny response body going into a JSON field needs base64 encoding if it contains binary data — raw binary breaks JSON
Use strconv.Atoi for parsingDon’t hand-roll integer parsers — they miss edge cases
RuleWhy it matters
No empty catch {} blocksEvery catch must set error state, show a toast, or at least console.error. Silent failures are invisible failures.
Escape special characters in outputMarkdown tables need | escaping, shell commands need space quoting, code generators need language-specific escaping
AbortController for HTTP calls in send/submit actionsWithout it, navigating away while a request is in-flight causes a state update on an unmounted component
Reset view state when data changesZoom level, filters, collapse state, scroll position — all need resetting when the underlying data source changes
Cmd/Ctrl+Enter for send actionsConsistent keyboard shortcut across all send buttons
RuleWhy it matters
Bidirectional comparisonIf comparing A>B, always also verify B>A. Asymmetric bugs are common in diff/comparison features.
Non-standard input handlingTest with HTTP methods beyond GET/POST, binary content types, unicode in filenames, pipes in header values