Skip to content

Conversation

@197g
Copy link
Member

@197g 197g commented Sep 11, 2025

Draft to be benchmarked.

Ensure that the buffer is fully flushed before checking for an end-of-file marker via the bool-flag in fdeflate. We first pass an empty input without setting the flag and repeat the whole flush cycle if that still yielded data. That same data could previously be masked by the missing end-of-file error.

See: image-rs/fdeflate#58 (comment)

@197g
Copy link
Member Author

197g commented Sep 11, 2025

I was testing this locally in combination with #640 and that seems to be important. I was quite defensive with the implementation of the flush behavior and it won't indicate all available data as available but instead performs the same as the usual decoding. This is due to potential internal buffering in zlib. The unfiltering via read-ahead makes this difference is inconsequential, if the data was decoded it can be unfiltered. But without the change there still is a difference in the amount of available data. I think it should be fine to also detect if there is outstanding commitable data just before the end-of-file enforcement, after no more data was produced.

This should eliminate the need for a loop as well, the eof: true call must not produce data anymore as far as I understand. @fintelia That is my interpretation of the issue in fdeflate. Right?

@197g
Copy link
Member Author

197g commented Sep 11, 2025

Turns out the comprehensive fix seemed even harder that the above comment. It goes like this: the currently published code detects whether the IDAT sequence (or fdat) is finished by comparing the next chunk type. For this, it consumes the input stream to fill the bytes that go into parse_u32. When we flush image data however, we return from this without changing our state to the new chunk since we yield two events: ImageDataFlushed and ChunkBegin, but only have one return value so this must be done in two calls. Both events are crucial, the former to terminate the loop in finish_decoding_image_data and the latter for the chunk sequence itself.

Now, this code relies on the caller retrying the update call with the flushed image data buffer to get both status calls, and end image also . But when the input stream ends (temporarily) exactly on the boundary after a new chunk type is indicated then this won't be retried and the state now differs based on the sequence of buffers passed to update in total. This would not be a problem on its own but it becomes critical when we try to introduce a third event into the sequence, adding ImageData before the flush event for any pending data available for unfiltering. Now, we wouldn't even get the ImageDataFlushed even without trying to update with an empty input buffer.

So the idea for the fix is now to delay the state change entirely until we can do it without flushing, i.e. until the image data itself has ended.

@197g
Copy link
Member Author

197g commented Sep 11, 2025

This seems to improve performance. Some quite significantly. (I think it mostly avoids the weird forced buffer allocations we previously had in the flush call). Alas, it's almost suspicious so no benchmark numbers for now.

@197g 197g marked this pull request as ready for review September 11, 2025 15:32
@fintelia
Copy link
Contributor

I've been thinking about this more, and I believe that the only reason we need to produce more data in finish_compressed_chunks at all is because of an edge case in the ImageData state. Specifically, if right at the end of decompressing an IDAT chunk we've completely filled all the space in the unfiltering buffer, then we'll advance to the next state even though there might be a tiny amount more output data buffered inside the decoder.

Instead, it might be possible to remain in the ImageData state until the remaining input is empty, and we either perform a decompress call that doesn't completely fill the unfiltering buffer or the stream reaches a done state.

This should eliminate the need for a loop as well, the eof: true call must not produce data anymore as far as I understand. @fintelia That is my interpretation of the issue in fdeflate. Right?

Yes, the only time you need to call fdeflate decompression in a loop is if you provide more input data or output space on subsequent iterations.

Also for the record: the comment that finish_compressed_chunks "enables some cleanup within the decompressor and flushing additional data which may have been kept back in case more data were passed to it" is wrong. It was true back when we used flate2, but fdeflate` has no such requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants