|
Thanks a lot. This is interesting because nobody should be using this pure Python object database. But I guess for those who are, having this fixed is valuable. Let's see if Copilot finds anything, but if not I think this should be merged, even though I only did an armchair review and admittedly don't understand a thing, trusting the test-suite fully here. |
Sorry, something went wrong.
There was a problem hiding this comment.
Fixes DecompressMemMapReader.read(N) returning b'' before true EOF for small N, which could prematurely terminate standard chunked read loops (issue #120). The change replaces a recursive refill strategy with an iterative decompression loop that continues as long as zlib is making forward progress, while still preserving the compressed_bytes_read() “scrub” behavior and avoiding recursion-depth failures on pathological inputs.
Changes:
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| gitdb/stream.py | Reworks decompression refill logic to avoid empty reads before EOF and removes recursion to prevent stack overflows on adversarial streams. |
| gitdb/test/test_stream.py | Adds a regression test covering chunked reads that previously could terminate early. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Sorry, something went wrong.
Summary
Closes #120
DecompressMemMapReader.read(N) could return b'' before _br == _s when N was small enough that the underlying zlib.decompress(indata, N) consumed input bytes without producing output (e.g. while ingesting the zlib header / dictionary frames). Callers using the standard while chunk := stream.read(N) idiom terminated at the first empty chunk -- before reaching the actual end of the uncompressed object. Reproduced with chunk sizes 1, 4, 16 against a 13 KB stream of compressible data.
Why the original guard existed
The previous if dcompdat and ... guard at the recursion site was added so that compressed_bytes_read() could drive read() in a scrub-loop that intentionally manipulates _br = 0 while the inner zlib object is already past EOF. Without that guard, the scrub loop would recurse forever because _br < _s stays true.
Fix
Convert the recursive "refill to size" branch into an iterative loop around the existing decompress block. The loop terminates when:
Condition 3 preserves the compressed_bytes_read() scrub safety: once the inner zip is at EOF, the loop breaks instead of looping forever. It also bounds the truncated-stream case (zip.eof never becomes true) by the input-exhaustion guard, and bounds the crafted "many empty deflate blocks" attack -- previously this could blow Python's recursion depth at ~1500+ stored-block headers because each recursion only consumed a handful of input bytes; the iterative form walks the stream forward without growing the stack.
Tests
Manual smoke
(Pre-fix this fails for chunk_size in (1, 4, 16).)
Truncated input is handled the same as before: read() returns whatever was decoded so far, then b'', instead of looping or raising. The crafted recursion-depth attack (~5000 empty deflate blocks ahead of one valid block) now decodes correctly instead of raising RecursionError.