Why adb screencap is slow¶
adb shell screencap -p is the default way to grab an Android screen from
a host machine. On a stock Android 15 emulator at 1440×3120, it takes a
median of 2.12 seconds per call — with individual calls ranging from
660 ms to over 3 seconds.
That's fine for a debugging screenshot. It's a disaster if you're a UI automation loop or an LLM agent that takes a screenshot between every step.
The same screen, captured through hs see at the default settings, takes
12 ms — about 170× faster. But "170× faster" is a fragile claim if
the comparison isn't honest. Here's where the time actually goes, and
what Handsets does about it.
The numbers¶
All measurements: Android 15 emulator (sdk_gphone64_arm64, 1440×3120,
SDK 35). Each variant warmed (3 calls) then sampled 20 times. Wall-clock
end-to-end from the host's perspective.
| variant | median | p10 | p90 | output |
|---|---|---|---|---|
adb exec-out screencap -p > x.png |
2122 ms | 661 | 3007 | 1973 KB |
adb exec-out screencap > x.raw |
675 ms | 664 | 683 | 17.5 MB |
hs see x.png (PNG, full res) |
584 ms | 569 | 594 | 1978 KB |
hs see x.jpg (JPEG q80, full) |
24 ms | 23 | 25 | 138 KB |
hs do 'screenshot' (JPEG q80, 768 long) |
12 ms | 11 | 14 | 22 KB |
The first row is the apples-to-apples adb baseline most people compare against. The last row is what an agent loop actually uses. The middle rows let us isolate where each saving comes from.
A few observations even before any explanation:
screencap -phas a huge variance — almost 5× between p10 and p90.screencapwithout-p(raw RGBA) is consistent at ~675 ms even though it ships 17.5 MB instead of 2 MB. Whatever causes the variance is not transport bandwidth.- All the
hspaths are tight (~5% spread). Whatever's happening in the warm daemon is deterministic.
Where does adb screencap -p spend its time?¶
To break it apart, run screencap on the device only — capturing to a
file rather than piping to host stdout — so we can see what's actually
happening down there.
| phase | median |
|---|---|
screencap /sdcard/x.raw (capture + raw write) |
75 ms |
screencap -p /sdcard/x.png (capture + PNG encode) |
624 ms |
adb pull /sdcard/x.raw (17.5 MB transport) |
60 ms |
adb pull /sdcard/x.png (2 MB transport) |
25 ms |
An on-device capture takes about 75 ms — that's the cost of SurfaceFlinger
producing a frame. Adding -p adds ~549 ms of pure PNG encoding on
the device's CPU. Transport is in the noise.
In other words, almost 90% of adb screencap -p's on-device time is
PNG encoding — single-threaded zlib-style compression on the slowest
CPU in the system, for an image you're probably going to delete in five
seconds.
That accounts for the 660-ms best case. The other ~1500 ms of variance
in adb exec-out screencap -p shows up on top of that, and is a
combination of:
screencapbeing spawned cold each call (fresh process, fresh SurfaceFlinger client connection).adb exec-outpiping the PNG byte-by-byte through several buffering layers (the on-device adbd, the USB transport, the host adb server, your shell's stdout).- The emulator's VM doing whatever else it does.
On a physical device the variance is smaller but the PNG floor doesn't move.
Three reasons it's slow¶
1. PNG encoding is wildly mismatched to the use case. PNG is a great archive format. It's a terrible "snapshot a frame for an LLM" format. JPEG q80 of the same image is ~14× smaller, and orders of magnitude faster to encode (Skia's JPEG path uses libjpeg-turbo with NEON SIMD; the PNG path is software zlib with no comparable acceleration). No agent loop cares about lossless screenshots — they care about "what's on the screen right now."
2. screencap is a fresh process every call.
The binary starts, opens the SurfaceFlinger client, captures, encodes,
exits. For one screenshot that's fine. For an agent doing 20 screenshots
a minute, you're paying process startup and SurfaceFlinger handshake
every single time.
3. adb exec-out pipes through several layers that don't love 2 MB.
The PNG comes out of screencap in one fwrite, gets chunked by adbd
into TCP/USB frames, goes through your host adb server, and arrives at
your shell as stdout. Each hop has buffering that under load doesn't
combine well. This is the variance source: best case 660 ms, worst 3 s.
What Handsets does differently¶
Three things, each of which fixes one of the problems above.
1. A warm VirtualDisplay mirror in a long-running daemon¶
hs use spawns a small JVM process on the device under app_process
(shell UID, hidden-API restrictions lifted). The daemon creates a
VirtualDisplay
that mirrors the default display into an
ImageReader
at a configurable resolution, and keeps it open between commands.
When you call hs see x.jpg, the latest frame is already sitting in
memory. There's no SurfaceFlinger snapshot to wait for, no screencap
process to start. The daemon acquires the most recent Image from the
ImageReader (already produced asynchronously by the listener thread on
the previous frame), JPEG-encodes it, and ships the bytes back.
The relevant detail in the mirror code
is that the listener thread does the expensive
copyPixelsFromBuffer — the GPU-fence-blocking call — without holding
the capture lock, then briefly takes the lock just to swap pointers.
Capture threads only ever read the most recent fully-written bitmap and
never wait on GPU work. A first call at a new resolution pays a one-time
~50 ms cost to create the mirror; the cache holds the four most-recent
sizes.
2. JPEG is the default; PNG is opt-in¶
hs see x.jpg is JPEG. hs see x.png is PNG. The file extension picks
the format. Agents get JPEG by default because we know what they're
going to do with it: ship it to a model. Debugging screenshots can ask
for PNG.
This single change accounts for most of the win: 24 ms (hs see x.jpg)
vs 584 ms (hs see x.png). Both go through the same warm mirror at full
resolution; the only difference is the encoder.
3. Default to 768-long-edge for the agent loop¶
Most LLM agents don't need 1440×3120 pixels. They need "enough to see
the screen." The raw wire command screenshot (without max=1) defaults
to a 768-long-edge JPEG, which is 22 KB on disk and 12 ms end-to-end.
The downscale happens inside the mirror itself — the VirtualDisplay is
created at the output resolution, so we're not allocating a 1440×3120
bitmap just to throw 80% of it away. Bigger sizes have their own warm
mirror cached separately (hs see x.jpg triggers the full-res mirror),
so you can mix and match without paying for the largest one every time.
Layer-by-layer wins¶
Adding up the changes:
| change | saves |
|---|---|
Warm VirtualDisplay vs cold screencap |
~75 ms per call (skips the capture) |
| JPEG q80 vs PNG | ~550 ms per call (encode dominates) |
TCP forward vs exec-out pipe |
~1500 ms when adb is in a bad mood (variance) |
| 768-long-edge default | another ~10 ms (smaller encode + smaller transport) |
The first three together get you to hs see x.jpg's 24 ms. The last
shaves the agent default to 12 ms.
When this matters (and when it doesn't)¶
Matters if you're:
- An LLM agent that screenshots after every action (typical loop: act, screenshot, dump UI, decide, repeat).
- A test framework that wants to record a frame every X ms.
- A monitoring system polling for visual changes.
- Anything where screenshot latency is on the user-perceived path.
Probably doesn't matter if you're:
- Taking one screenshot a day for a bug report.
- Recording a video —
hs see(the bare GUI viewer) uses MediaCodec H.264 streaming viaH264Streamer.java, a separate path. - Working over a slow remote
adb tcpiplink where the wire, not the encode, is the bottleneck.
For the agent case specifically: a 12-ms screenshot lets you treat screenshots as free relative to the rest of the loop (UI dump is ~150 ms, an LLM round-trip is seconds). Two-second screenshots make screenshotting the dominant cost, and you start skipping them — at which point your agent gets flakier.
Caveats¶
- These numbers are from an emulator. Physical devices have lower
variance on the
adb exec-outpath (typically 600–900 ms instead of 2–3 s) and faster on-device JPEG encoding. The relative ordering doesn't change. - The first call after a resolution change pays a one-time ~50 ms cost to create the new VirtualDisplay mirror.
- If a foreground window has
FLAG_SECURE, both adb and Handsets produce an all-black frame. Handsets detects this and returns a named error pointing at the offending window instead of silently handing you a black PNG. - The daemon runs as the shell UID via
app_process, with hidden-API restrictions lifted. That's necessary becausecreateVirtualDisplayrejects the system Context's op-package on Android 14+; we have to forge acom.android.shellpackage context. The init comments inScreenshot.javaexplain the three-tier strategy.
Reproducing¶
# Set up
curl -fsSL https://raw.githubusercontent.com/elliotgao2/handsets/main/install.sh | bash
hs use
# Headline benchmark
python3 - <<'PY'
import statistics, subprocess, time
def t(cmd):
for _ in range(3): subprocess.call(cmd, shell=True, stdout=subprocess.DEVNULL)
s = []
for _ in range(20):
t0 = time.perf_counter_ns()
subprocess.call(cmd, shell=True, stdout=subprocess.DEVNULL)
s.append((time.perf_counter_ns() - t0) / 1e6)
s.sort()
return statistics.median(s)
print(f"adb -p: {t('adb exec-out screencap -p > /tmp/a.png'):.1f} ms")
print(f"hs jpg: {t('hs see /tmp/h.jpg'):.1f} ms")
print(f"hs default: {t(\"hs do 'screenshot' > /tmp/h2.jpg\"):.1f} ms")
PY
If you reproduce something materially different, open an issue with your device model and Android version — the numbers above are honest but I'm curious where they hold and where they don't.
Handsets is a CLI for driving Android devices, built for LLM agents and shell scripts. MIT.