Skip to content

Device Management

Ghost can connect to mobile devices — iOS Simulators, Android emulators, and physical phones — to provide a full mobile testing experience. You can see the device’s screen in real time, tap on elements to inspect them, browse the UI hierarchy (every button, text field, and container), generate test automation selectors, and correlate screen interactions with network traffic. This is what turns Ghost from a network proxy into a complete mobile testing tool.

The device manager handles the entire lifecycle: discovering devices on the system, connecting to them (setting up communication channels), continuously capturing screenshots, monitoring touch interactions, caching the element hierarchy for fast lookups, and cleaning up when devices disconnect.

What this diagram shows — the two-platform device management architecture:

The Device Manager is the central coordinator with five subsystems. Discovery Loops periodically scan for available devices — on iOS, it runs xcrun simctl list devices to find booted simulators; on Android, it runs adb devices to find connected emulators and USB devices. The Connection Manager establishes communication channels — for iOS, it connects to WebDriverAgent (WDA), an Appium component that provides HTTP APIs for UI automation; for Android, it connects to atx-agent, a similar HTTP service running on port 7912. Screenshot Ring Buffers capture the device screen at ~1 FPS — iOS simulators use the fast simctl io screenshot command (bypassing WDA’s slower screenshot API), while Android uses atx-agent’s JPEG endpoint. The Hierarchy Cache stores the current UI element tree (every button, label, text field on screen) for fast hit-testing when the user taps on the screen viewer. Touch Monitors detect real interactions — on Android, Ghost reads the Linux kernel’s getevent stream for actual touch events with pixel coordinates; on iOS, Ghost polls the WDA hierarchy and detects changes between snapshots.

Ghost runs two discovery loops that continuously scan for available devices:

  1. Run adb devices and parse the output — each line has a serial number and state (device, offline, or unauthorized)
  2. Create Device entries with state "discovered" for each new serial
  3. Get device info via adb shell getprop — queries ro.product.model, ro.product.brand, ro.build.version.release, ro.build.version.sdk, ro.serialno
  4. Get screen dimensions via adb shell wm size — parses “Physical size: 1080x1920”
  5. Detect emulator vs physical device by serial format — serials starting with "emulator-" are emulators, everything else is treated as USB
  6. Remove devices that disappeared from the adb devices output (broadcasts device.removed)

Each ADB call has a 5-second timeout.

iOS Simulator Discovery (every 15 seconds, macOS only)

Section titled “iOS Simulator Discovery (every 15 seconds, macOS only)”
  1. Run xcrun simctl list devices booted --json — returns JSON listing all simulators with their boot state
  2. Filter for devices with State: "Booted" — only running simulators are discovered
  3. Extract OS version from the runtime identifier — "com.apple.CoreSimulator.SimRuntime.iOS-17-4" becomes "17.4"
  4. Create Device entries with connection type "simulator", client IP 127.0.0.1
  5. Remove simulators that are no longer booted

This loop only runs on macOS (runtime.GOOS == "darwin") since Xcode simulators don’t exist on other platforms.

PlatformConnectionInspector AgentScreenshotTouch MonitorCoordinate System
Android USBadb forward to port 7912atx-agent HTTPatx-agent JPEGgetevent kernel streamScreen pixels directly
Android Emulatoradb forward to port 7912atx-agent HTTPatx-agent JPEGgetevent kernel streamScreen pixels directly
iOS SimulatorlocalhostWDA (auto-launched if needed)simctl io PNG (fast path)Hierarchy polling + diffWDA coords → screen pixels (scale transform)
iOS PhysicalUSBWDA (must be pre-installed)WDA HTTP PNGHierarchy polling + diffWDA coords → screen pixels (scale transform)

When you click “Connect” on an Android device:

  1. Allocate a local port — starting from port 17912, Ghost assigns a unique local port for this device. Ports from disconnected devices are recycled before incrementing.
  2. Forward the portadb forward tcp:{localPort} tcp:7912 creates a tunnel from your computer to the atx-agent service on the device.
  3. Check if atx-agent is running — Ghost sends a health check (GET /info with 2-second timeout) to the forwarded port.
  4. Start atx-agent if neededadb shell /data/local/tmp/atx-agent server -d launches the agent in daemon mode on the device.
  5. Wait for readiness — Ghost polls the health endpoint up to 10 times, 500ms apart (5 seconds total). If atx-agent doesn’t respond, the connection fails.
  6. Get foreground appadb shell dumpsys activity activities identifies the currently active app (parses mResumedActivity or mFocusedActivity lines).
  7. Start capture loop — begins taking screenshots at ~1 FPS.
  8. Start touch monitor — begins reading getevent for real-time touch detection.
  9. Start hierarchy cache warmer — every 3 seconds, fetches the UI element tree and caches it for fast hit-testing.

When you click “Connect” on an iOS simulator or physical device:

  1. Find a running WDA — Ghost scans ports 8100 through 8109 looking for a WebDriverAgent HTTP server. It checks each port and verifies it’s not already claimed by another connected device.
  2. Auto-launch WDA if needed (simulator only) — if no WDA is found and the device is a simulator:
    • Clone WebDriverAgent source from https://github.com/appium/WebDriverAgent.git into ~/.ghost/wda/WebDriverAgent/ (shallow clone, depth 1)
    • Build for the simulator: xcodebuild build-for-testing with CODE_SIGNING_ALLOWED=NO
    • Launch: xcodebuild test-without-building with the generated .xctestrun file
    • Wait up to 30 seconds for WDA to respond (polling every 1 second)
  3. Create a WDA session — POST /session with empty capabilities. WDA returns a session ID used for all subsequent requests.
  4. Get coordinate spaces — Ghost needs two coordinate systems:
    • WDA coordinate space — GET /session/{id}/window/size returns the logical size (e.g., 390×844 for iPhone 14)
    • Screenshot pixel space — actual PNG dimensions from a screenshot (e.g., 1170×2532 at 3x Retina)
    • Both are stored so Ghost can translate between them. When you tap at pixel coordinate (585, 1266), Ghost converts it to WDA coordinate (195, 422) using the scale factor.
  5. Configure WDA for fast polling — sets snapshotMaxDepth: 30 and customSnapshotTimeout: 0 for faster hierarchy responses.
  6. Start capture and monitoring — screenshots (using simctl io for simulators, WDA HTTP for physical) and hierarchy polling.

iOS has a complication that Android doesn’t: WDA uses its own coordinate system (logical points) that differs from the actual screenshot pixel dimensions. Ghost handles this transparently:

Screen pixel → WDA coordinate (for taps):

wdaX = round(pixelX × WDAWidth / ScreenWidth)
wdaY = round(pixelY × WDAHeight / ScreenHeight)

WDA coordinate → Screen pixel (for hierarchy bounds):

pixelX = wdaBoundsX × ScreenWidth / WDAWidth
pixelY = wdaBoundsY × ScreenHeight / WDAHeight

Android doesn’t need translation — adb shell input tap uses screen pixel coordinates directly, and the atx-agent hierarchy reports bounds in screen pixels.

Each connected device gets a fixed-size circular buffer that stores the most recent screenshots:

PropertyValue
Buffer size30 frames
Capture rate~1 FPS (1-second interval)
Frame size~100 KB each (JPEG for Android, PNG for iOS)
Memory per device~3 MB
Thread safetysync.RWMutex — concurrent reads allowed, exclusive write

A ring buffer is a fixed-size array that wraps around — when it’s full, the oldest frame is overwritten by the newest. This means Ghost always keeps the last 30 seconds of screen captures without growing memory usage.

MethodDescription
Push(data, contentType)Add a new frame, overwriting the oldest if the buffer is full. Each frame gets a sequence number.
Latest()Get the newest frame (calls At(0))
At(offset)Random access from newest — At(0) is the latest frame, At(29) is the oldest. Used by the scrub slider in the UI.
Frames()All frames in chronological order (oldest first) — used for GIF generation in bug reports
Meta()Returns count, oldest timestamp, and newest timestamp (as Unix milliseconds) — used by the frontend scrub slider to show the time range
  • Live screen mirroring — the frontend polls Latest() to show the device screen in real time
  • Screenshot scrubbing — a slider in the UI lets you browse the last 30 frames, seeing what the screen looked like seconds ago
  • GIF generation — bug reports can include an animated GIF showing the last 30 seconds of screen activity (capped at 5 MB)
Device TypeMethodFormatSpeed
Android (all)GET /screenshot/0 on atx-agentJPEG~200ms
iOS Simulatorxcrun simctl io {udid} screenshot --type=png -PNG (stdout)~450ms
iOS PhysicalGET /session/{id}/screenshot on WDAPNG (base64 in JSON)~4.9s

iOS simulators use simctl io instead of WDA for screenshots because it’s ~10x faster — WDA’s screenshot endpoint takes nearly 5 seconds on physical devices.

The element tree (also called the UI hierarchy or view hierarchy) is a tree structure representing every visible element on the device’s screen — buttons, text fields, labels, containers, scroll views, everything. Ghost parses this tree from platform-specific formats into a unified Element structure.

Each element in the tree has these fields:

FieldTypeDescription
IDstringGhost-assigned identifier (a_0, a_1 for Android; i_0, i_1 for iOS)
ClassNamestringThe widget type (android.widget.Button, XCUIElementTypeButton)
TextstringVisible text content
ResourceIDstringAndroid resource identifier (com.app:id/login_button)
AccessibilityIDstringAccessibility identifier — the most reliable selector for automation
ContentDescstringAndroid content description (for screen readers)
LabelstringiOS accessibility label
NamestringiOS element name
ValuestringCurrent value (text fields, switches, sliders)
BoundsobjectPosition and size on screen: {X, Y, Width, Height}
VisibleboolWhether the element is visible to the user
ClickableboolWhether the element responds to taps
EnabledboolWhether the element is interactive (not grayed out)
FocusedboolWhether the element has keyboard focus
ScrollableboolWhether the element can scroll
SelectedboolWhether the element is in a selected state
CheckedboolWhether a checkbox/switch is on
TraitsstringiOS accessibility traits
PackageNamestringAndroid app package name
ChildrenlistChild elements (nested tree structure)
DepthintHow deep in the tree (root = 0)
ElementCountintTotal elements in this subtree

Ghost fetches the hierarchy from atx-agent (GET /dump/hierarchy, max 2 MB) and parses it as XML:

  • Each <node> element becomes an Element
  • Bounds are parsed from the bounds attribute format: [left,top][right,bottom] → converted to {X: left, Y: top, Width: right-left, Height: bottom-top}
  • content-desc attribute maps to AccessibilityID
  • Clickable elements: determined by the clickable attribute in the XML

Ghost fetches the hierarchy from WDA (GET /session/{id}/source, depth 30) and parses it as streaming XML:

  • The XML element name IS the class name (<XCUIElementTypeButton>ClassName: "XCUIElementTypeButton")
  • name attribute maps to both AccessibilityID and Name
  • label attribute maps to Label and Text (if text is empty)
  • Bounds are in WDA coordinate space — Ghost transforms them to screen pixel space using the stored scale factors
  • Clickable elements: determined by class name — Button, Link, Switch, Toggle, Slider, Stepper, SegmentedControl, TextField, SecureTextField, TextView, Picker, DatePicker, PageIndicator, Tab, TabBar, Cell, MenuItem are all considered clickable
  • Scrollable: ScrollView, Table, CollectionView

When you tap on the screen viewer in Ghost’s UI, Ghost needs to find which element is at that screen coordinate. ElementAtPoint(x, y) walks the cached hierarchy tree and finds the deepest visible element whose bounds contain the point (children are checked in reverse order so overlapping elements resolve to the topmost one).

The hierarchy cache is warmed every 3 seconds on Android — a background goroutine fetches the full hierarchy and stores it. This means hit-testing is nearly instant (it reads from cache, never blocks on a network call). iOS uses the same cache, updated from the hierarchy polling loop.

Ghost can detect WebView containers (embedded browsers) in the element hierarchy:

Android WebView classes detected:

  • android.webkit.WebView
  • android.webkit.WebViewClient
  • com.tencent.smtt.sdk.WebView (Tencent X5 engine)
  • com.uc.webview.export.WebView (UCBrowser engine)

Additionally, on Android, Ghost discovers debuggable WebView sockets by reading /proc/net/unix and finding entries matching webview_devtools_remote_{PID} or chrome_devtools_remote. These sockets can potentially be used for Chrome DevTools Protocol inspection.

iOS WebView classes detected:

  • XCUIElementTypeWebView
  • Any class containing “WebView” or “SafariViewController”

Detected WebViews are tagged with webview=true in the element’s attributes and returned by the ListWebViews API.

When you select an element in Ghost’s UI, Ghost generates test automation selectors — code snippets you can copy directly into your test framework. Selectors are ordered by reliability (how likely they are to keep working when the UI changes):

Android Selectors (5 types, priority order)

Section titled “Android Selectors (5 types, priority order)”
TypeReliabilityExampleFrameworks
accessibility_idExcellentcontent_desc: "Login"Raw, Appium (Java/Python), Maestro
resource_idGoodcom.app:id/login_btnRaw, Appium (Java/Python), Espresso, Maestro
content_descOKcontent_desc: "Submit order"Raw, Appium (Java/Python), Espresso
ui_selectorOKtext("Login") + className("Button")Raw, Appium (Java/Python), Espresso, Maestro
xpathFragile//android.widget.Button[@text='Login']Raw, Appium (Java/Python)
TypeReliabilityExampleFrameworks
accessibility_idExcellentname: "loginButton"Raw, Appium (Java/Python), XCUITest, Maestro
predicate_stringGoodlabel == "Login" AND type == "Button"Raw, Appium (Java/Python), XCUITest
class_chainOK**/XCUIElementTypeButton[label == “Login”]Raw, Appium (Java/Python)
xpathFragile//XCUIElementTypeButton[@name='Login']Raw, Appium (Java/Python)

Accessibility ID is always preferred because it’s explicitly set by developers for automation and rarely changes. XPath is always last because it depends on the exact tree structure — adding a wrapper <View> anywhere in the hierarchy breaks XPath selectors.

Ghost monitors how the user interacts with the device — detecting taps, scrolls, long presses, and text input. This data is used for traffic correlation (linking a tap on “Login” to the resulting API call) and bug report generation (showing what the user did).

Android’s Linux kernel exposes raw input events through /dev/input/. Ghost reads these events in real time via adb shell getevent -lt, which outputs human-readable event lines with timestamps.

Touch device discovery: Ghost first runs adb shell getevent -p to find the input device that supports ABS_MT_POSITION_X (multitouch X coordinate). This is the touchscreen.

Event stream parsing: Each line from getevent -lt is parsed for:

  • ABS_MT_TRACKING_ID — touch start (new tracking ID) or touch end (0xFFFFFFFF)
  • ABS_MT_POSITION_X — horizontal touch coordinate (in device-specific units)
  • ABS_MT_POSITION_Y — vertical touch coordinate

Coordinates are converted to screen pixels: screenX = rawX × screenWidth / maxX

Gesture classification:

GestureCondition
TapDuration < 500ms AND pixel drift < 20px
Long pressDuration ≥ 800ms AND pixel drift < 20px
ScrollPixel drift ≥ 50px

After classifying the gesture, Ghost looks up the element at the touch coordinates using the cached hierarchy tree — adding element details (class, text, ID) to the interaction event.

Retry policy: If the getevent process dies (device disconnects momentarily, process killed), Ghost retries up to 10 times with increasing backoff: 3s, 6s, 12s, 30s (clamped at 30s).

iOS doesn’t expose raw touch events the way Android does. Instead, Ghost polls the WDA hierarchy repeatedly and detects changes between snapshots:

Polling rate: 50ms minimum gap between polls, plus WDA response latency (100-300ms), resulting in ~3-7 polls per second.

Change detection (via AnalyzeHierarchyHistory):

  • Each snapshot is flattened into a map of element paths → element states
  • Two-pass matching: first by tree path, then by accessibility ID (handles elements that moved in the tree)
  • Detected changes:
    • Focus gained → classified as a tap interaction
    • Selected state gained → classified as a tap interaction
    • Text/value changed → classified as text input (with 2-second coalescing window for typing, 400ms for other value changes)
    • Navigation bar label changed → classified as screen change
    • Tab bar selection changed → classified as screen change

Post-processing:

  • mergeTapNavigation — if a tap and screen change happen within 500ms, they’re merged into a single interaction (tap caused navigation)
  • deduplicateEvents — same interaction type on the same element within 200ms is deduplicated

Noise filtering: Ghost ignores changes to keyboard keys, elements with the UpdatesFrequently trait, and generic container types (XCUIElementTypeOther, Group, Cell, LayoutArea, LayoutItem).

Each detected interaction is stored in the InteractionLog ring buffer (capacity: 100 events):

FieldValuesDescription
Actiontap, scroll, long_press, text_change, value_change, screen_change, text_input, keyWhat happened
X, YintegersScreen coordinates where it happened
ElementClassstringThe tapped element’s class name
ElementTextstringThe tapped element’s text content
ElementLabelstringAccessibility label
ElementIDstringResource ID (Android) or accessibility ID (iOS)
Confidencehigh, medium, injectedhigh for Android getevent (actual kernel events), medium for iOS hierarchy diff (inferred), injected for programmatic actions

What this diagram shows — the five device states:

A device starts as Discovered when the discovery loop first finds it (an Android device appearing in adb devices or an iOS simulator booting). When you click “Connect” in Ghost’s UI, it moves to Connecting while Ghost sets up the communication channel (port forwarding, WDA session, etc.). If everything succeeds, it becomes Connected — screenshots, hierarchy, and touch monitoring are all active. If the connection setup fails (WDA won’t start, atx-agent crashes, etc.), it moves to Error with a message explaining what went wrong. When you disconnect or the device disappears (USB unplugged, simulator shut down), it moves to Disconnected and resources are cleaned up.

When a device disconnects (explicitly or because it disappeared):

  1. Stop touch monitor — cancel the getevent process (Android) or hierarchy polling goroutine (iOS), clear the hierarchy cache and hierarchy log
  2. Stop screenshot capture — cancel the capture goroutine, delete the ring buffer (freeing ~3 MB of memory)
  3. Platform-specific cleanup:
    • Android: Remove the ADB port forward (adb forward --remove), release the local port back to the free pool
    • iOS: Delete the WDA session (DELETE /session/{id}), kill the WDA process if Ghost launched it (sends Kill signal, waits for exit, closes log file)
  4. Update state — set state to "disconnected", clear error and session ID
  5. Broadcast — send device.disconnected WebSocket event to the frontend

Ghost can correlate device interactions with network traffic — linking a tap on “Add to Cart” to the resulting POST /api/cart/add API call.

The CorrelateFlows function:

  1. Takes a reference timestamp (when the interaction happened) and a time window (default: 3 seconds, max: 30 seconds)
  2. Filters flows that occurred within ±window of the timestamp
  3. Sorts by proximity to the reference time (closest first)
  4. Classifies each flow as "primary" (likely triggered by the interaction) or "noise" (background traffic like analytics, health checks)
  5. Returns correlation details including method, host, path, status, timing delta, body previews (first 2,048 bytes), and SDK category

Noise detection uses domain lists (known analytics/tracking domains), path prefix lists, and path substring lists to identify background traffic that wasn’t triggered by the user’s action.

The device manager uses 11 mutexes to protect concurrent access — this is necessary because discovery loops, capture goroutines, touch monitors, hierarchy warmers, and API handlers all run simultaneously:

MutexTypeProtects
muRWMutexThe devices map (device add/remove/lookup)
ringMuRWMutexScreenshot ring buffers (one per device)
captureMuMutexCapture loop cancellation functions
interactionMuRWMutexInteraction log ring buffers
hierarchyLogMuRWMutexHierarchy snapshot logs
touchMuMutexTouch monitor cancellation functions
hierarchyCacheMuRWMutexCached element trees for hit-testing
portMuMutexPort allocator (next port counter + free pool)
wdaMuMutexWDA process tracking (iOS only)
actionLockMuMutexPer-device action serialization locks
broadcastMuRWMutexWebSocket broadcast function reference

RWMutex is used where reads are much more frequent than writes (screenshot polling, hierarchy lookups). Regular Mutex is used where operations are always exclusive (port allocation, process management).

EventWhenPayload
device.discoveredNew device found in discovery loopDeviceInfo snapshot
device.updatedState change during connection, WDA process deathDeviceInfo snapshot
device.connectedConnection setup completed successfullyDeviceInfo snapshot
device.disconnectedExplicit disconnect or device disappearedDeviceInfo snapshot
device.removedDevice disappeared from discovery after disconnectDeviceInfo snapshot

All endpoints are under /api/v1/inspector/devices:

MethodPathDescription
GET/List all discovered and connected devices
GET/{id}Get a single device by ID
POST/{id}/connectStart async connection (returns 202, 120-second timeout)
POST/{id}/disconnectDisconnect and clean up
GET/{id}/screenshotBinary JPEG/PNG. ?offset=N for scrub slider (0=latest, 29=oldest)
GET/{id}/screenshot/metaRing buffer state (count, oldest/newest timestamps) for scrub slider UI
GET/{id}/hierarchyCurrent UI element tree (full JSON)
GET/{id}/elementDeepest element at screen coordinates. ?x=N&y=N
GET/{id}/selectors/{nodeId}Test automation selectors for a specific element
GET/{id}/correlationCorrelated network traffic. ?ts=<unix_ms>&window=<ms> (default 3000, max 30000)
GET/{id}/interactionsInteraction history. ?since=300 (seconds, max 3600), ?limit=50 (max 200)
POST/{id}/bug-reportGenerate AI-enhanced bug report with GIF. Body: {node_id, flow_ts, window_ms, offset, session_id} (max 64 KB)
POST/{id}/tapTap at coordinates. Body: {"x":N,"y":N} (max 1 KB)
POST/{id}/inputType text or send key. Body: {"text":"hello"} or {"key":"enter"} (max 4 KB). Valid keys: enter, backspace, tab
GET/{id}/interaction-contextInteractions with selectors and correlated flows. ?since=300&limit=20 (max 100)
GET/{id}/webviewsWebView elements from hierarchy + debuggable sockets (Android)
ConstantValuePurpose
Android discovery interval5 secondsHow often adb devices is polled
iOS discovery interval15 secondsHow often simctl list is polled
Screenshot interval1 secondCapture rate (~1 FPS)
Screenshot buffer size30 framesRing buffer capacity
Starting port17912First port for ADB forwarding
atx-agent port7912Fixed port on Android devices
WDA port range8100-8109Ports scanned for running WDA
WDA launch timeout30 secondsMax wait for WDA to start on simulator
Hierarchy cache refresh3 secondsAndroid hierarchy warmer interval
iOS hierarchy poll gap50 millisecondsMinimum delay between WDA hierarchy fetches
Interaction log capacity100 eventsRing buffer for touch interactions
Hierarchy log capacity100 snapshotsRing buffer for hierarchy snapshots (~30-35 seconds at polling rate)
Max text input length500 charactersSafety cap on typed text
Max screenshot response20 MBatx-agent screenshot size cap
Max hierarchy response2 MBatx-agent hierarchy size cap
Hierarchy flatten depth30Maximum tree depth for element flattening
Tap max duration500 msGesture: duration threshold for tap vs long press
Tap max drift20 pxGesture: movement threshold for tap vs scroll
Long press minimum800 msGesture: minimum duration for long press
Scroll min drift50 pxGesture: minimum movement for scroll detection
Touch monitor retries10Max retries for getevent process