Skip to content

Session Comparison

Session comparison answers the big question: “what changed between these two testing sessions?” Instead of comparing individual requests one by one, Ghost analyzes all traffic from two sessions at once, groups it by endpoint, and tells you exactly which APIs broke, slowed down, appeared, or disappeared.

This is your primary tool for regression testing. Capture a session before a deployment, capture another session after the deployment with the same test flow, and let Ghost show you every difference. It’s like a “git diff” but for API behavior instead of code.

  • Before/after deployment — Did the new code break any existing API endpoints? Are there new errors that weren’t there before?
  • Environment comparison — Staging vs. production: are all APIs behaving the same way?
  • Test run validation — Run the same test flow twice and verify consistent behavior
  • Performance monitoring — Did any endpoints get slower after the latest release?
  • API surface tracking — Are there new endpoints that weren’t there last week? Did any endpoints disappear?

Three ways to start:

  1. Keyboard shortcut: Press Cmd+Shift+K (macOS) or Ctrl+Shift+K (Windows) to toggle the comparison view on or off
  2. Toolbar button: Click the diff icon (two stacked rectangles) in the command bar — it has a tooltip “Compare Sessions (Cmd+Shift+K)”
  3. Drag-and-drop: Drop a .har or .json file onto Ghost — it imports the file into a new session and automatically opens comparison between the imported session and your active session

When the comparison opens, it replaces the main traffic area with a dedicated comparison view. Your regular traffic list comes back when you close the comparison.

At the top of the comparison view, two dropdown selectors let you pick Source (baseline — what you expect) and Target (what you’re comparing against). All your sessions appear in both dropdowns. A swap button lets you quickly flip source and target.

Once both sessions are selected, Ghost automatically fetches and analyzes the comparison. A loading indicator shows progress, and if anything goes wrong, an error message with a retry button appears.

Limits: Each session can contain up to 50,000 flows for comparison. If a session exceeds this limit, Ghost returns a “too large” error (HTTP 413). The comparison analysis has a 30-second timeout — for very large sessions, this ensures the UI doesn’t hang indefinitely.

After both sessions load, a banner at the top shows key statistics for both sessions side by side:

MetricWhat It Shows
Flow countTotal number of HTTP flows captured in each session
Error countFlows with HTTP status 400+ or connection errors
Average durationMean response time across all flows in the session
Total sizeSum of all response body sizes
Unique hostsHow many distinct servers/domains appeared (case-insensitive)
Unique endpointsHow many distinct API endpoints were called

Between the two session cards, a delta strip shows the key differences:

DeltaWhat It Means
New endpointsAPI endpoints that exist only in the target session (not in the source)
Removed endpointsEndpoints that existed in the source but are missing from the target
Changed endpointsEndpoints that exist in both but show meaningful differences
Unchanged endpointsEndpoints that behaved identically in both sessions
RegressionsEndpoints where the error rate increased significantly
Performance issuesEndpoints where response time got significantly worse

Ghost doesn’t compare individual flows directly — that would be overwhelming with thousands of requests. Instead, it groups flows into endpoints by combining the HTTP method, hostname, and normalized URL path.

Path normalization is key: Ghost automatically replaces dynamic segments in URLs with {id} placeholders so that requests to the same logical endpoint get grouped together even when the IDs differ. For example:

Original PathNormalizedWhy
/api/products/12345/api/products/{id}Numeric IDs (pure digits) are replaced
/api/users/550e8400-e29b-41d4-a716-446655440000/api/users/{id}UUIDs (36 hex chars with dashes) are replaced
/api/orders/01HWRFQP3K5M9TGWQX7Z/api/orders/{id}ULIDs (26-char Crockford base32) are replaced
/api/files/a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4/api/files/{id}Hex hashes (32+ hex characters) are replaced
/api/products?page=2&sort=name/api/productsQuery strings are stripped

This means GET /api/products/123 and GET /api/products/456 are treated as the same endpoint (GET /api/products/{id}), so their statistics are aggregated together for comparison.

Additional normalization: trailing slashes are stripped (except root /), double slashes are collapsed, and hostnames are lowercased.

Each endpoint that exists in both sessions is analyzed and classified by comparing its statistics. Ghost uses specific thresholds to avoid false positives:

An endpoint is flagged as a regression when ALL of these conditions are true:

  • The source session’s error rate was below 50% (it was mostly working)
  • The target session’s error rate increased by at least 10 percentage points (e.g., from 5% to 15%)
  • The dominant status code class changed from success (2xx) to error (4xx or 5xx)

This catches the most critical problem: an API that was working fine is now returning errors.

An endpoint is flagged as a performance issue when BOTH of these conditions are true:

  • The target’s average response time is at least 2× the source (doubled or worse)
  • The absolute increase is at least 100 milliseconds

The dual threshold prevents false positives: a 1ms → 3ms change is 3× but insignificant in practice. A 500ms → 600ms change is only 1.2× and within normal variation. But a 200ms → 500ms change is 2.5× AND 300ms — that’s a real problem.

The dominant status code (most common status) changed between sessions. For example, an endpoint that mostly returned 200 in the source now mostly returns 301 in the target. Only flagged when it’s not already classified as a regression.

The average response body size changed by at least 20% AND at least 1,024 bytes (1 KB). This catches APIs that started returning significantly more or less data — which could indicate a schema change, missing pagination, or broken serialization.

Endpoints are sorted by severity (most critical first):

  1. Regression (highest priority)
  2. Removed (endpoint disappeared)
  3. Performance degradation
  4. Status code changed
  5. New (endpoint appeared)
  6. Size changed
  7. Unchanged (lowest priority)

Within the same severity level, endpoints are sorted alphabetically by their path pattern.

The main area of the comparison view is a virtualized table listing every endpoint found across both sessions. Endpoints are grouped under collapsible host headers (e.g., “api.example.com”), with aggregate status for each host.

ColumnWhat It Shows
Status iconColor-coded indicator: green circle for new, red minus for removed, amber dot for changed, red down-arrow for regression, orange clock for performance, grey check for unchanged
MethodHTTP method badge (GET, POST, PUT, DELETE, etc.)
EndpointThe normalized path pattern (e.g., /api/products/{id})
SourceStatus code distribution from the source session (e.g., “200 ×5, 404 ×1”)
TargetStatus code distribution from the target session
DurationAverage duration for both sessions with delta and multiplier (e.g., “142ms → 340ms (+198ms, 2.4×)“)
SizeAverage response size for both sessions with delta
CountNumber of flows in each session

Hovering over any endpoint row shows a detailed tooltip with the full statistics for both source and target side by side:

  • Call count
  • Status code distribution (every code and its count)
  • Average and p95 duration
  • Average response size
  • Up to 5 sample paths (original, before normalization) — so you can see the actual URLs that were grouped together

A segmented control bar above the table lets you filter which endpoints are shown:

FilterWhat It ShowsCount Badge
AllEvery endpoint from both sessionsTotal endpoint count
ChangedEndpoints with any detected change (status, size, regression, performance)Number of changed endpoints
RegressionsOnly endpoints classified as regressions (error rate increased)Number of regressions
PerformanceOnly endpoints with performance degradation (2×+ slower and 100ms+ increase)Number of performance issues
NewEndpoints that exist only in the target sessionNumber of new endpoints
RemovedEndpoints that exist only in the source sessionNumber of removed endpoints

The default filter when you first open the comparison is Changed — showing you only what’s different, which is usually what you care about most.

A search input below the filters lets you type to find specific endpoints. It supports:

  • Substring matching — type checkout to find endpoints with “checkout” in the host or path
  • Wildcard patterns — type api.*.com or /products/* for glob-style matching
  • Sample path matching — also searches against the original (pre-normalization) sample paths
  • Debounced — waits 300ms after you stop typing before filtering, to avoid laggy UI while typing

Click any endpoint row to open a detailed side-by-side view in the bottom panel. This shows the actual request-response data for a matched flow pair from that endpoint.

When an endpoint has multiple flows in each session (which is common — you might have 5 requests to /api/products in session A and 7 in session B), Ghost matches them into pairs and lets you navigate between them:

  • Prev/Next arrows to step through pairs
  • “X of Y” counter showing which pair you’re viewing (up to 100 pairs per endpoint)
  • Unmatched flow counts — if one session has more flows than the other, the drill-down shows “3 extra in Source” or “2 extra in Target” so you know there’s a volume difference

Each flow pair is displayed with the source flow on the left and target flow on the right:

URL Diff — Full URLs side-by-side. Highlights if different (useful when the same normalized endpoint had different query parameters).

Response Summary — Status code (highlighted if changed), content length, and duration with delta (green for faster, red for slower).

Timing Breakdown — A phase-by-phase comparison of the request lifecycle:

PhaseWhat It MeasuresWhy It Matters
DNSDomain name resolution timeDNS server issues, caching differences
TCP ConnectNetwork connection establishmentNetwork latency, routing changes
TLS HandshakeHTTPS encryption setupCertificate chain issues, crypto overhead
TTFBTime to First Byte (server thinking time)The most important metric — how fast the server processes the request
TransferResponse body download timeLarge response, bandwidth issues
TotalEnd-to-end durationOverall request performance

Each phase shows the source time, target time, and delta. Phases where the target is significantly slower (50ms+ difference) are highlighted as bottlenecks — drawing your eye to exactly where the slowdown occurred.

Request Headers Diff — All request headers compared key-by-key with color coding: green (added in target), red (removed in target), amber (value changed), no highlight (identical).

Request Body Diff — For JSON content types, a structural field-by-field comparison showing added, removed, and changed fields at every depth. For non-JSON content, raw side-by-side display. Bodies larger than 1 MB show a “too large to diff” message to prevent browser memory issues.

Response Headers Diff — Same format as request headers.

Response Body Diff — Same structural diffing as request body. A legend shows counts of differences: “N only in Source, N only in Comparison, N changed.” This is where you’ll find the most important changes — new fields in the API response, changed values, missing data.

Press Escape to close the drill-down and return to the endpoint table. Press Escape again to close the entire comparison view.

For each endpoint in each session, Ghost calculates comprehensive statistics that power the comparison:

StatisticHow It’s Calculated
CountTotal number of flows matching this endpoint
Status code distributionMap of each status code to its occurrence count (e.g., {200: 45, 404: 3, 500: 2})
Average durationMean of all flow durations in milliseconds
Min/Max durationFastest and slowest response times
p95 duration95th percentile — 95% of requests were faster than this. Calculated by sorting all durations and picking the value at the 95% position. This is the industry-standard metric for “real-world worst case” performance.
Average response sizeMean response body size in bytes
Error rateFraction of flows with HTTP status 400+ or connection errors (0.0 to 1.0)
Flow IDsUp to 100 flow IDs for drill-down pair matching
Sample pathsUp to 5 original (pre-normalization) paths for reference