The archive scanning subsystem provides streaming archive parsing with deterministic budget enforcement for zip-bomb protection. It handles tar, gzip, bzip2, and zip formats, enforces nesting depth limits, controls decompressed output budgets at entry/archive/root scopes, and constructs virtual paths for files inside archives.
crates/scanner-scheduler/src/archive/ defines the archive scanning contract: configuration, format detection, budget tracking, virtual path construction, outcome taxonomy, and the streaming scan loop. Format-specific handlers live in archive/formats/ (low-level parsers) and scheduler/local_fs_*.rs (blocking-path integration). The io_uring path delegates to archive/scan.rs via the ArchiveEntrySink trait.
The architecture emphasizes:
- Streaming only: archives are parsed sequentially without materializing to disk
- Deterministic budgets: every resource cap is enforced without non-determinism (wall-clock deadlines are opt-in)
- Zero allocation after startup: all buffers are preallocated and reused via
reset()/clear() - Sink-driven decoupling: the scan core delivers structured events via
ArchiveEntrySink, allowing different consumers (pipeline, io_uring workers, simulation harness)
| Format | Kind | Extension(s) | Access Pattern | Entry Model |
|---|---|---|---|---|
| gzip | Gzip |
.gz |
Sequential Read |
Single decompressed blob |
| bzip2 | Bzip2 |
.bz2 |
Sequential Read |
Single decompressed blob |
| tar | Tar |
.tar |
Sequential 512-byte blocks | Multiple named entries |
| tar+gzip | TarGz |
.tar.gz, .tgz |
Sequential gzip → tar | Multiple named entries |
| tar+bzip2 | TarBz2 |
.tar.bz2, .tbz2 |
Sequential bzip2 → tar | Multiple named entries |
| zip | Zip |
.zip |
Random access via EOCD | Multiple named entries |
Detection uses a two-phase algorithm (archive/detect.rs):
-
Extension-based (
detect_kind_from_path/detect_kind_from_name_bytes): pure byte-suffix match using case-insensitive ASCII comparison (| 0x20). Single-byte dispatch on the last character gives O(1) common-case rejection. Extension detection always takes precedence — this is the only way to distinguish.tar.gzfrom plain.gz. -
Magic-byte sniffing (
sniff_kind_from_header): probes the first bytes when the extension is unrecognized. Probe order: gzip (1f 8b) → zip (PK..) → bzip2 (BZh+ digit1–9) → tar (ustarat offset 257 in a 512-byte header). First match wins.
The combined function detect_kind tries extension first, then falls back to magic bytes.
| Format | Minimum bytes | Signature |
|---|---|---|
| gzip | 2 | 0x1f 0x8b |
| zip | 4 | PK + (03,04), (01,02), (05,06), or (07,08) |
| bzip2 | 4 | BZh + ASCII digit '1'–'9' |
| tar (ustar) | 512 | "ustar" at offset 257 |
Archive scanning integrates at two levels:
Blocking path (scheduler/local_fs_owner.rs): the worker thread handles archives inline via dispatch_archive_scan() (scheduler/local_fs_archive_ctx.rs), which routes to format-specific handlers (local_fs_gzip.rs, local_fs_bzip2.rs, local_fs_tar.rs, local_fs_zip.rs). Blocking workers process one file at a time, so decompression blocking is acceptable.
io_uring path (scheduler/local_fs_uring.rs): archives are offloaded to dedicated archive worker threads to avoid stalling the io_uring completion loop. Detection happens at two points:
- Discovery time: files with known archive extensions are routed directly to the archive channel
- First-chunk classification: I/O threads sniff magic bytes after the first read completes
Both paths use the same archive subsystem types (ArchiveConfig, ArchiveBudgets, ArchiveScratch) and produce the same outcomes (ArchiveEnd).
The archive scan core (archive/scan.rs) drives an ArchiveEntrySink trait (scan.rs) that decouples parsing from downstream processing:
on_entry_start(&meta) // exactly once per entry
on_entry_chunk(chunk) // zero or more payload windows
on_entry_chunk(chunk)
...
on_entry_end() // exactly once, even on truncation
The start/end pair is always balanced. The io_uring path implements ArchiveEntrySink via UringArchiveSink, which calls scan_chunk_into → drop_prefix_findings → dedupe → emit. The blocking path uses ArchiveScanCtx::scan_and_emit_chunk (local_fs_archive_ctx.rs).
Budget tracking (archive/budget.rs) prevents resource exhaustion from zip bombs, deeply nested archives, and adversarial metadata. All accounting is deterministic and reproducible.
Three nested scopes, each with independent caps:
Root (per-source-file)
└─ Archive (per-container: zip, tar, tar.gz, …)
└─ Entry (per-file inside the container)
When charging decompressed output, the tightest remaining allowance across all three scopes wins. The binding constraint determines the BudgetHit variant, which tells callers whether to skip the entry, mark the archive partial, or stop the entire root.
| Cap | Config Field | Default | Scope |
|---|---|---|---|
| Nesting depth | max_archive_depth |
3 | Per root |
| Entry count | max_entries_per_archive |
4096 | Per archive |
| Entry output bytes | max_uncompressed_bytes_per_entry |
64 MiB | Per entry |
| Archive output bytes | max_total_uncompressed_bytes_per_archive |
256 MiB | Per archive |
| Root output bytes | max_total_uncompressed_bytes_per_root |
512 MiB | Per root |
| Metadata bytes | max_archive_metadata_bytes |
16 MiB | Per archive |
| Inflation ratio | max_inflation_ratio |
128x | Per entry + per archive |
| Wall-clock deadline | max_wall_clock_secs_per_root |
None (opt-in) |
Per root |
The nesting invariant entry <= archive <= root is enforced by ArchiveConfig::validate() (config.rs).
Ratio tracking runs at both archive and entry scopes to prevent a credit-accumulation attack. Per-entry ratio is tracked independently: entry_out <= entry_in * R. This prevents a pattern where many small well-compressed entries build up archive-level headroom that a single malicious entry later exploits (budget.rs).
When compressed input is zero (unknown), the ratio check is skipped to avoid false positives. The remaining_decompressed_allowance_with_ratio_probe(true) method applies a conservative 1-byte compressed-input assumption to cap the first read.
reset() // arm deadline, zero root counters
enter_archive() // push frame, enforce depth cap
note_entry() / begin_entry() // count + open entry scope
charge_compressed_in(n) // raw bytes consumed
charge_decompressed_out(n) // payload bytes delivered
charge_discarded_out(n) // payload bytes read but dropped
end_entry(scanned) // close entry scope
exit_archive() // pop frame
enter_archive/exit_archive and begin_entry/end_entry must be balanced. The frame stack is preallocated to max_archive_depth and never grows — no Vec push/pop on hot paths.
BudgetHit variants are ordered by increasing blast radius (budget.rs):
| Variant | Scope | Effect |
|---|---|---|
SkipEntry |
Current entry only | Archive continues with next entry |
SkipArchive |
Entire archive | Discarded (no scan progress yet) |
PartialArchive |
Entire archive | Stops; bytes already scanned are kept |
StopRoot |
All archives under this root | Everything stops |
ChargeResult::Clamp { allowed, hit } tells the caller the exact number of bytes it may still process before the limit takes effect.
The wall-clock deadline is opt-in (max_wall_clock_secs_per_root). When configured:
reset()arms anInstant-based deadline (the only placeInstant::now()is called)is_deadline_expired()is polled at natural loop boundaries- The deadline does not affect byte or count accounting
- Maximum allowed value is 86,400 seconds (24 hours), enforced by
MAX_WALL_CLOCK_SECS_PER_ROOT
In test/sim-harness builds, a deterministic countdown (set_deadline_check_countdown) replaces the real clock.
ArchiveConfig (config.rs) — shared archive scanning configuration. All limits are hard bounds. Archives are treated as hostile input.
| Field | Type | Purpose |
|---|---|---|
enabled |
bool |
Master enable switch |
max_archive_depth |
u8 |
Max nested archive depth |
max_entries_per_archive |
u32 |
Max entries per container |
max_uncompressed_bytes_per_entry |
u64 |
Per-entry decompressed byte cap |
max_total_uncompressed_bytes_per_archive |
u64 |
Per-archive decompressed byte cap |
max_total_uncompressed_bytes_per_root |
u64 |
Root-level cross-archive byte cap |
max_archive_metadata_bytes |
u64 |
Metadata parsing cap (headers, CD) |
max_inflation_ratio |
u32 |
Decompressed/compressed ratio cap |
max_virtual_path_len_per_entry |
usize |
Max display path bytes per entry |
max_virtual_path_bytes_per_archive |
usize |
Total path bytes per archive |
max_wall_clock_secs_per_root |
Option<u64> |
Optional CPU-exhaustion deadline |
encrypted_policy |
EncryptedPolicy |
How to handle encrypted content |
unsupported_policy |
UnsupportedPolicy |
How to handle unsupported formats |
EncryptedPolicy / UnsupportedPolicy (config.rs) — escalation ladders from SkipWithTelemetry → FailArchive → FailRun.
ArchiveBudgets (budget.rs) — deterministic budget tracker. Holds immutable caps from config and mutable counters. A fixed-size frame stack tracks per-archive state without allocation.
ArchiveFrame (budget.rs) — per-archive accounting frame (48 bytes, #[repr(C)]). Tracks entries_seen, entries_scanned, metadata_bytes, compressed_in, decompressed_out, entry_compressed_in, and entry_decompressed_out. Entry-open state uses a u64::MAX sentinel instead of a separate bool to avoid 7 bytes of padding.
BudgetHit (budget.rs) — classification of which budget was the binding constraint.
ChargeResult (budget.rs) — result of charging a byte quantity: Ok (full amount fits) or Clamp { allowed, hit } (partial).
ArchiveKind (detect.rs) — #[repr(u8)] enum: Gzip(0), Tar(1), Zip(2), TarGz(3), Bzip2(4), TarBz2(5). The is_container() method distinguishes multi-entry formats from single-stream formats.
ArchiveEntrySink (scan.rs) — trait decoupling archive parsing from downstream. Methods: on_entry_start, on_entry_chunk, on_entry_end.
EntryMeta (scan.rs) — metadata for a single entry: display_path, size_hint, flags.
EntryChunk (scan.rs) — one iteration of the sliding-window read loop: data (overlap prefix + new bytes), base_offset, new_bytes_start, new_bytes_len.
ArchiveScratch<Z> (scan.rs) — reusable scratch state. Contains EntryPathCanonicalizer, per-depth VirtualPathBuilders, ArchiveBudgets, per-depth TarCursors, ZipCursor, gzip header/name buffers, and the stream_buf. Preallocated to max_archive_depth + 2 depth slots.
ArchiveScanCtx (scan.rs, crate-private) — borrow-split view that decomposes ArchiveScratch into independent mutable borrows for recursive nesting via split_first_mut. Not part of the public API.
ArchiveEnd (scan.rs) — terminal outcome: Scanned, Skipped(ArchiveSkipReason), Partial(PartialReason).
ArchiveSkipReason (outcome.rs) — 14 variants for why an entire archive was skipped before any payload bytes were scanned. #[repr(u8)] with stable discriminants used as array indices.
EntrySkipReason (outcome.rs) — 10 variants for why a specific entry was skipped. Entry skips do not abort the archive.
PartialReason (outcome.rs) — 12 variants for why an archive was only partially scanned. Partial outcomes retain results for bytes already processed.
ArchiveStats (outcome.rs) — per-worker aggregate with scalar counters, per-reason breakdown arrays, and a bounded sample ring (ArchiveSampleRing). All record_* methods are gated behind cfg!(all(feature = "perf-stats", debug_assertions)) for zero production overhead.
EntryPathCanonicalizer (path.rs) — sanitizes raw archive entry names into bounded, printable-ASCII display bytes. Resolves ./.., escapes non-printable bytes as %HH, enforces length and component caps.
VirtualPathBuilder (path.rs) — joins parent and entry display paths with :: separator. Truncation appends ~#<16-hex-digit> FNV-1a hash suffix.
CanonicalPath (path.rs) — result of canonicalization: bytes, had_traversal, truncated, component_cap_exceeded, hash64.
VirtualPath (path.rs) — result of virtual path construction: bytes, truncated, hash64.
GzipStream<R> (formats/gzip.rs) — streaming gzip decoder wrapping flate2::MultiGzDecoder<CountedRead<R>>. Handles concatenated members. Reports compressed-byte deltas via take_compressed_delta().
Bzip2Stream<R> (formats/bzip2.rs) — streaming bzip2 decoder wrapping bzip2::MultiBzDecoder<CountedRead<R>>. Same delta-reporting interface as GzipStream.
CompressedStream trait (formats/mod.rs) — abstracts GzipStream and Bzip2Stream for generic scanning functions.
TarCursor (formats/tar.rs) — stateful tar header parser. Walks 512-byte header blocks, handles GNU longname (L) and PAX extended-header (x/g) records internally, yields TarEntryMeta. Zero allocation after startup.
TarRead trait (formats/tar.rs) — Read + optional take_compressed_delta() for compressed-byte accounting.
ZipCursor<R> (formats/zip.rs) — streaming cursor over a zip central directory. Parses EOCD, iterates CDFH entries, validates bounds. Supports Zip32 only; Zip64 sentinel values trigger UnsupportedFeature.
ZipSource trait (formats/zip.rs) — Read + Seek source with len() and try_clone(). Implemented for File, Cursor<Arc<[u8]>>, Cursor<Vec<u8>>.
ZipEntryReader (formats/zip.rs) — decompressed entry reader: Stored(CountedRead<LimitedRead>) or Deflate(DeflateDecoder<CountedRead<LimitedRead>>).
LimitedRead (formats/zip.rs) — bounds reads to a fixed byte count (compressed entry size).
CountedRead (util.rs) — Read wrapper that counts bytes consumed, driving inflation-ratio enforcement.
1. Discovery
├─ Extension match (detect_kind_from_path)
│ └─ Route directly to archive workers (bypass I/O threads)
└─ First-chunk magic sniff (sniff_kind_from_header)
└─ I/O thread routes to archive channel
2. Archive Open
├─ reset() → arm deadline, zero root counters
├─ enter_archive() → push frame, enforce depth cap
└─ Format-specific init:
├─ gzip: GzipStream::new_with_header (parse FNAME)
├─ bzip2: Bzip2Stream::new
├─ tar: TarCursor::reset
├─ tar.gz: GzipStream wrapping → tar iteration
├─ tar.bz2: Bzip2Stream wrapping → tar iteration
└─ zip: ZipCursor::open (EOCD → central directory)
3. Entry Iteration
For each entry:
├─ Canonicalize name (EntryPathCanonicalizer)
├─ Build virtual path with locator suffix (@t/@z/@c)
├─ Check path budget
├─ Skip non-regular entries (dirs, symlinks)
├─ Check for nested archive (detect_kind_from_name_bytes)
│ ├─ If nestable and depth allows → recurse
│ └─ If zip-in-tar → unsupported (no random access)
├─ begin_entry() → open entry budget scope
└─ Sliding-window read loop:
├─ Check deadline
├─ Copy overlap carry to buffer front
├─ Probe remaining budget allowance
├─ Read up to min(chunk_size, allowance, buf capacity)
├─ charge_compressed_in() + charge_decompressed_out()
├─ Deliver EntryChunk to sink (or scan_and_emit_chunk)
└─ Update offset/carry; break if budget clamped
4. Entry Close
├─ on_entry_end() / end_entry(scanned)
├─ Drain unconsumed payload (tar alignment)
└─ Record entry stats (scanned/skipped/partial)
5. Archive Close
├─ exit_archive() → pop frame
└─ Return ArchiveEnd (Scanned/Skipped/Partial)
Every entry payload uses the same read pattern:
stream_buf layout on each iteration:
|<-- carry (overlap) -->|<--- new read (up to chunk_size) --->|
^ ^
buf[0] buf[carry]
carry = overlap.min(bytes_emitted_so_far)
Before each read, the last carry bytes of the previous chunk are copied to the buffer front so downstream pattern matchers see a sliding window with overlap bytes of look-behind context. Budget checks happen after the read returns: bytes beyond the budget are truncated and the loop exits.
The upper bound on a single read is ARCHIVE_STREAM_READ_MAX (256 KiB), keeping per-iteration work bounded even with large chunk_size.
- Uses
flate2::read::MultiGzDecoderto handle concatenated gzip members as one stream - Parses the optional gzip FNAME header field for the virtual entry name; falls back to
<gunzip>when absent - Header parsing uses a bounded peek buffer (
PeekRead) that is moved into the decoder and recovered afterward for reuse - Compressed-byte deltas tracked via
CountedReadwrapping the raw reader - Inflation-ratio pre-clamping is always active (
ratio_active = true)
- Uses
bzip2::read::MultiBzDecoderto handle concatenated bzip2 members - Virtual entry name is always
<bunzip2>(bzip2 has no standard filename field) - Same
CountedReaddelta reporting as gzip - CPU exhaustion note: bzip2 block decompression can buffer up to 900 KiB internally per
read()call. The deadline check fires between read iterations, not during a single decompression call, so a single block decode can run uninterrupted. Production deployments should setmax_wall_clock_secs_per_root.
- Sequential 512-byte block parsing via
TarCursor - Handles GNU longname (
L) and PAX extended-header (x/g) records internally - PAX
path=override applies per-file only (global PAXpathis parsed but not applied to avoid misattribution) - Name resolution priority: PAX
path> GNU longname > headername(with ustarprefix/namejoining) - End-of-archive: two consecutive zero blocks or clean EOF at header boundary
- Size fields parsed as NUL/space-padded ASCII octal; overflow (>21 digits) is rejected
- Entries with tar typeflag
0(NUL) or'0'(ASCII 0x30) are treated as regular files; everything else is skipped is_zero_blockuses word-wide (u64) unaligned reads with early exit for fast detection- After each entry's payload, any unconsumed bytes are drained and tar padding is consumed to maintain 512-byte alignment
- Wraps the reader in
GzipStreamand delegates toscan_tar_streamwithratio_active = true - Inflation-ratio enforcement applies to the decompressed tar payload
- Wraps the reader in
Bzip2Streamand delegates toscan_tar_streamwithratio_active = true - Same inflation-ratio enforcement as tar+gzip
- Requires random access (
Read + Seek) via theZipSourcetrait - Locates the end-of-central-directory (EOCD) record by scanning backward from the file end (up to 66 KiB window)
- Validates: single-disk only, no Zip64 sentinel values (
0xFFFF/0xFFFFFFFF) - Iterates central directory file headers (CDFH) sequentially
- For each entry: reads the local file header (LFH) to locate the payload start
- Supported compression methods: stored (method 0) and deflate (method 8)
- Encrypted entries (flag bit 0) are handled per
EncryptedPolicy - Compressed-byte deltas tracked manually (cumulative compressed bytes diffed between reads) because the zip reader does not expose per-read deltas like
TarRead - Ratio pre-clamping active only for deflate entries; stored entries have 1:1 ratio
- No recursive nesting: zip entries inside tar cannot be descended (no random access), handled per
UnsupportedPolicy - Filename storage is bounded; oversized names are truncated with a streaming FNV-1a hash for the suffix
Tar entries whose names match a known archive extension are recursively descended up to max_archive_depth. The recursion uses split_first_mut to peel per-depth scratch slices (vpaths, path_budget_used, tar_cursors) without allocation. Each nesting level gets its own independent state while sharing the budget tracker and stream buffer.
Supported nesting paths:
- tar → gzip, bzip2, tar, tar.gz, tar.bz2 (sequential streams)
- tar → zip: not supported (zip requires random access; handled by
UnsupportedPolicy)
Virtual paths are display-only identifiers for files inside archives. They are not filesystem paths and are never used to open files.
The full virtual path is assembled as: <parent_display>::<canonicalized_entry_name><locator_suffix>
Example: /tmp/outer.tar::inner.zip::dir/file.txt@t000000000000002a
The :: separator is chosen to be visually distinct from filesystem separators.
- Normalize separators (
\→/) and split into components - Drop
.; resolve..via a stack, clamping at root (traversal setshad_traversalflag) - Emit escaped display bytes: non-printable bytes →
%HH(uppercase hex) - Stream FNV-1a hash over full (unbounded) output while storing only up to
max_lenbytes - If truncated, replace tail with
~#<16-hex-digit>hash suffix (avoids splitting%HHescapes at the boundary) - Component count capped at
DEFAULT_MAX_COMPONENTS(256)
Each virtual path is suffixed with a fixed-length locator for downstream re-seeking:
| Suffix | Format | Value |
|---|---|---|
@t<16hex> |
tar | Header block index |
@z<16hex> |
zip | Local file header offset (when valid) |
@c<16hex> |
zip | CDFH offset (fallback when LFH offset invalid) |
Gzip entries omit the locator because gzip contains exactly one decompressed stream.
Per-archive path byte usage is tracked in path_budget_used to prevent unbounded growth from archives with many entries having long paths. Exceeding max_virtual_path_bytes_per_archive triggers PartialReason::PathBudgetExceeded.
- Truncated headers:
read_exact_or_eofreturnsUnexpectedEofwith format-labeled messages (e.g.,"tar truncated","zip truncated") - Bad magic: format detection returns
None; the archive is treated as a regular file - Malformed size fields: tar
parse_tar_size_octalreturnsNone→PartialReason::MalformedTar - ZIP EOCD not found:
ZipOpen::Stop(MalformedZip) - ZIP Zip64 sentinels:
ZipOpen::Skip(UnsupportedFeature)orZipNext::Stop(UnsupportedFeature) - Mid-stream corruption: compressed stream read errors →
PartialReason::CompressedStreamCorrupt(gzip/bzip2) orPartialReason::MalformedTar/MalformedZip
Partial outcomes retain results for bytes already scanned. Skipped outcomes discard nothing (no bytes were scanned).
- File open failures →
ArchiveEnd::Skipped(IoError) - Read errors during header/payload →
ArchiveEnd::Partialwith the appropriate format reason - No retry logic: each error is treated as fatal for that archive
EncryptedPolicy and UnsupportedPolicy provide three levels:
SkipWithTelemetry— skip and record (default)FailArchive— abort the current archiveFailRun— setabort_runflag, abort the entire scan
Every archive encounter records exactly one top-level outcome via ArchiveStats:
record_archive_scanned()— fully processedrecord_archive_skipped(reason, path, sample)— rejected before payloadrecord_archive_partial(reason, path, sample)— stopped mid-scan
Entry-level outcomes:
record_entry_scanned()— at least one payload byte scannedrecord_entry_skipped(reason, path, sample)— rejected before payloadrecord_entry_partial(reason, path, sample)— stopped mid-entry (budget/corruption)
The bounded ArchiveSampleRing (32 samples, 192-byte path prefix each) captures the first N skip/partial events for diagnostics.
| Parameter | Default | Rationale |
|---|---|---|
max_archive_depth |
3 | Covers .tar.gz containing a .zip; deeper nesting is adversarial |
max_entries_per_archive |
4096 | Generous for real archives, bounds CPU in entry-counting loops |
max_uncompressed_bytes_per_entry |
64 MiB | Limits peak memory per entry |
max_total_uncompressed_bytes_per_archive |
256 MiB | Limits total archive output |
max_total_uncompressed_bytes_per_root |
512 MiB | Limits cross-archive output under 1 GiB |
max_archive_metadata_bytes |
16 MiB | Bounds header/CD parsing |
max_inflation_ratio |
128x | Accommodates high-compression formats; catches classic zip bombs |
max_virtual_path_len_per_entry |
1024 bytes | Bounds display path storage |
max_virtual_path_bytes_per_archive |
1 MiB | Bounds total path arena per archive |
max_wall_clock_secs_per_root |
None |
Keeps defaults deterministic; production should opt in (e.g., 30s) |
DEFAULT_WALL_CLOCK_SECS_PER_ROOT |
30s | Suggested production value |
MAX_WALL_CLOCK_SECS_PER_ROOT |
86,400s | Upper bound to prevent Instant overflow |
| Constant | Value | Location | Purpose |
|---|---|---|---|
ARCHIVE_STREAM_READ_MAX |
256 KiB | scan.rs |
Upper bound on single decompressed read |
LOCATOR_LEN |
18 bytes | scan.rs |
@ + kind + 16 hex digits |
TAR_BLOCK_LEN |
512 bytes | formats/tar.rs |
Tar header/data block size |
USTAR_MAGIC_OFFSET |
257 | formats/tar.rs |
Offset of "ustar" magic in tar header |
EOCD_MIN_LEN |
22 bytes | formats/zip.rs |
Minimum end-of-central-directory size |
EOCD_SEARCH_MAX |
66 KiB | formats/zip.rs |
Backward search window for EOCD |
CDFH_LEN |
46 bytes | formats/zip.rs |
Central directory fixed header length |
LFH_LEN |
30 bytes | formats/zip.rs |
Local file header fixed length |
DEFAULT_MAX_COMPONENTS |
256 | path.rs |
Max path components during canonicalization |
TRUNC_SUFFIX_LEN |
18 bytes | path.rs |
~# + 16 hex digits |
ARCHIVE_SAMPLE_MAX |
32 | outcome.rs |
Max samples in bounded ring |
ARCHIVE_SAMPLE_PATH_PREFIX_MAX |
192 bytes | outcome.rs |
Max path prefix per sample |
ENTRY_NOT_OPEN |
u64::MAX |
budget.rs |
Sentinel for entry-not-open state |
VIRTUAL_FILE_ID_BASE |
0x8000_0000 |
local_fs_archive_ctx.rs |
High-bit namespace for virtual IDs |
ArchiveFrame is exactly 48 bytes (2 × u32 + 5 × u64, #[repr(C)], compile-time asserted). The frame stack is preallocated to max_archive_depth elements at construction and never grows.
| File | Purpose |
|---|---|
archive/mod.rs |
Module root, re-exports |
archive/config.rs |
ArchiveConfig struct + defaults + validation |
archive/budget.rs |
ArchiveBudgets struct |
archive/budget.rs |
BudgetHit enum |
archive/budget.rs |
ChargeResult enum |
archive/budget.rs |
ArchiveFrame struct |
archive/budget.rs |
enter_archive() |
archive/budget.rs |
charge_decompressed_out() — five-cap minimum logic |
archive/budget.rs |
charge_discarded_out() — bypasses per-entry output cap |
archive/detect.rs |
ArchiveKind enum |
archive/detect.rs |
detect_kind_from_path() |
archive/detect.rs |
sniff_kind_from_header() |
archive/detect.rs |
detect_kind() — combined detection |
archive/detect.rs |
detect_kind_from_name_bytes() — byte-level suffix matcher |
archive/outcome.rs |
ArchiveSkipReason (14 variants) |
archive/outcome.rs |
EntrySkipReason (10 variants) |
archive/outcome.rs |
PartialReason (12 variants) |
archive/outcome.rs |
ArchiveStats struct |
archive/outcome.rs |
ArchiveSampleRing struct |
archive/path.rs |
EntryPathCanonicalizer struct |
archive/path.rs |
canonicalize() method |
archive/path.rs |
VirtualPathBuilder struct |
archive/path.rs |
build() method |
archive/path.rs |
apply_hash_suffix_truncation() |
archive/scan.rs |
ArchiveEnd enum |
archive/scan.rs |
EntryMeta struct |
archive/scan.rs |
EntryChunk struct |
archive/scan.rs |
ArchiveEntrySink trait |
archive/scan.rs |
ArchiveScratch<Z> struct |
archive/scan.rs |
ArchiveScanCtx struct |
archive/scan.rs |
scan_gzip_stream() |
archive/scan.rs |
scan_compressed_entry_stream() — shared inner loop |
archive/scan.rs |
scan_bzip2_stream() |
archive/scan.rs |
scan_tar_stream() |
archive/scan.rs |
scan_tar_stream_nested() — recursive tar iteration |
archive/scan.rs |
scan_targz_stream() |
archive/scan.rs |
scan_tarbz2_stream() |
archive/scan.rs |
scan_zip_source() |
archive/util.rs |
CountedRead struct |
archive/util.rs |
FNV-1a hash functions |
archive/util.rs |
write_u64_hex_lower() |
archive/util.rs |
read_exact_n() |
archive/util.rs |
budget_hit_to_partial() |
archive/formats/mod.rs |
CompressedStream trait |
archive/formats/gzip.rs |
GzipStream struct |
archive/formats/gzip.rs |
new_with_header() — header parsing |
archive/formats/bzip2.rs |
Bzip2Stream struct |
archive/formats/tar.rs |
TarRead trait |
archive/formats/tar.rs |
TarEntryMeta struct |
archive/formats/tar.rs |
TarNext enum |
archive/formats/tar.rs |
TarCursor struct |
archive/formats/tar.rs |
next_entry() — header parsing loop |
archive/formats/zip.rs |
ZipSource trait |
archive/formats/zip.rs |
ZipEntryMeta struct |
archive/formats/zip.rs |
ZipCursor struct |
archive/formats/zip.rs |
open() — EOCD parsing |
archive/formats/zip.rs |
next_entry() — CDFH iteration |
archive/formats/zip.rs |
open_entry_reader() — LFH validation + reader construction |
archive/formats/zip.rs |
ZipEntryReader enum |
archive/formats/zip.rs |
LimitedRead struct |
scheduler/local_fs_archive_ctx.rs |
alloc_virtual_file_id() |
scheduler/local_fs_archive_ctx.rs |
dispatch_archive_scan() |
scheduler/local_fs_archive_ctx.rs |
ArchiveEnd (scheduler variant) |
scheduler/local_fs_archive_ctx.rs |
ArchiveScanCtx (blocking-path variant) |
scheduler/local_fs_archive_ctx.rs |
scan_and_emit_chunk() |
scheduler/local_fs_archive_ctx.rs |
apply_entry_budget_clamp() |
scheduler/local_fs_archive_ctx.rs |
discard_remaining_payload() |
scheduler/local_fs_archive_ctx.rs |
scan_compressed_stream_nested() |
scheduler/local_fs_gzip.rs |
process_gzip_file() |
scheduler/local_fs_bzip2.rs |
process_bzip2_file() |
scheduler/local_fs_tar.rs |
process_tar_file(), process_targz_file(), process_tarbz2_file(), recursive tar iteration |
scheduler/local_fs_zip.rs |
process_zip_file() |
scheduler/local_fs_extract.rs |
extract_and_scan_file() — binary extraction (non-archive) |