Architecture - 8 min read - 2026
How Pedulli's best-of-N racing dispatcher actually works
By Francesco Pedulli
When you drop a file at pedulli.io, the system makes a quick decision: which of several compression engines should race, and in what order. The decision uses a few cheap signals - filetype magic bytes, extension, and an entropy probe of the file's head. Below I walk through the dispatcher, the engine pool and the container format, and why racing a handful of candidates beats trying to pick a single "smartest" codec up front.
The engine pool (in-browser)
Each engine is a separately-compiled WebAssembly binary loaded on demand. They share no code. Some are specialists (periodic / constant data, log-like text, container-structured media); the broad-spectrum fallback is a strong general-purpose LZMA-class codec. Crucially, well-known codecs - xz, zstd and brotli - are also candidates in the race, alongside your data's SRD math transforms. Pedulli is a best-of-N racer that keeps the smallest verified output, so it is never larger than the best standard codec it races - worst case +1 byte for the engine tag. It wins outright on structured data; on already-optimal or random data it simply selects and ties the best codec at that codec's own size.
The router
The router returns a ranked candidate list from the cheap signals. Schematically:
if (entropy is very low || head is all-zero) rank periodic specialists first
if (head looks like an MP4 / container) rank container specialist first
if (entropy is low || text-like extension) rank log/text specialists first
if (entropy is near-maximal) short-list: data is likely incompressible
otherwise full race over all candidates
The router never commits to a single engine. It picks the top-K and races them in parallel, returning the smallest output. Even when it is confident about a log file, it still runs the other shortlisted candidates - the extra CPU is cheap on the user's machine, and the cost of guessing wrong is missed compression.
The race
In-browser, each engine runs in a dedicated Web Worker. The dispatcher posts the input buffer (transferred, not copied), and awaits a result. The key behaviour is pick-smallest with early termination: once a candidate produces an output far below the others, the dispatcher can cancel the remaining workers, because a heavier codec finishing later cannot beat an already-tiny result. Server-side, the same logic runs natively under a concurrency semaphore; each codec is a separately-compiled native binary, same pick-smallest rule.
The container format
Every Pedulli output starts with a 1-byte engine tag:
0x00-> identity / raw: no engine produced something smaller; the original bytes follow.- other tag values -> one of the in-browser or server-side engines.
This means the decoder knows immediately which engine ran without inspecting the payload, a .pdli file roundtrips through future versions of the dispatcher, and the overhead on incompressible data is just that 1 byte - smaller than the framing other formats prepend.
Why not just run "the best" engine?
People ask: why not pick the best engine ahead of time and skip the race? Because the real data shape is often not what the extension claims - an .mp4 that is really raw bytes, a .log that is actually JSON, a .png that is a screenshot of one flat colour. The router gets it right most of the time, and the race is the safety net for the rest. The marginal CPU cost is justified by the marginal win, especially for cold archives you compress once and keep forever.
What about decompression?
Decompression is simpler: the 1-byte tag says which engine to use, we feed it the rest, and out comes the original. No race, no router, no ambiguity - and the result is always byte-exact, verified by a SHA-256 roundtrip. Compression is the search step; decompression is just the replay.
What's public and what stays closed
The dispatcher, the router, the WASM bindings and the container format are public, because an interface should be auditable. The specialist codecs themselves stay closed - that is the protected runtime, and it is how the project pays for itself. This is the self-sealed posture: proof without disclosure - you can verify the byte-exact result and request the exact samples behind any number, while the engine internals remain closed. No absolute-security or "unbeatable" claims attach to it.
Watch the race on your own file
The browser trial shows which candidate won and the byte-exact result. Or check API pricing.
Built in Forli, Italy. EU-sovereign, GDPR by design.