-
Notifications
You must be signed in to change notification settings - Fork 153
Allow full unfiltering for partial data #640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
197g
wants to merge
2
commits into
image-rs:master
Choose a base branch
from
197g:issue-639-partial-readahead
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -34,10 +34,16 @@ impl UnfilteringBuffer { | |
| /// Asserts in debug builds that all the invariants hold. No-op in release | ||
| /// builds. Intended to be called after creating or mutating `self` to | ||
| /// ensure that the final state preserves the invariants. | ||
| #[track_caller] | ||
| fn debug_assert_invariants(&self) { | ||
| // The previous row pointer is always behind the current row pointer. | ||
| debug_assert!(self.prev_start <= self.current_start); | ||
| debug_assert!(self.current_start <= self.available); | ||
| // The current row pointer is always into filled bytes (but sometimes may point to | ||
| // read-only back reference buffer, see `curr_row_len_with_unfilter_ahead`). | ||
| debug_assert!(self.current_start <= self.filled); | ||
| // The mut-available region is always well-defined. | ||
| debug_assert!(self.available <= self.filled); | ||
| // The logically filled region is always within bounds of the data stream. | ||
| debug_assert!(self.filled <= self.data_stream.len()); | ||
| } | ||
|
|
||
|
|
@@ -111,6 +117,7 @@ impl UnfilteringBuffer { | |
| self.current_start = 0; | ||
| self.filled = 0; | ||
| self.available = 0; | ||
| self.debug_assert_invariants(); | ||
| } | ||
|
|
||
| /// Returns the previous (already `unfilter`-ed) row. | ||
|
|
@@ -123,6 +130,20 @@ impl UnfilteringBuffer { | |
| self.available - self.current_start | ||
| } | ||
|
|
||
| /// Returns how many bytes of the current row are available in the buffer. | ||
| /// | ||
| /// Why negative? There's a backreference section in the buffer that is reserved for the zlib / | ||
| /// deflate decoder (or any other for that matter). We must not modify that part of the data | ||
| /// but if we only *read* the row then the start of our current_start is allowed to run _into_ | ||
| /// that section. So as a less efficient fallback where partial data is crucial we can unfilter | ||
| /// outside the read area instead of in-place but still indicate how many bytes we have | ||
| /// consumed. In that case, this number is negative. | ||
| pub(crate) fn curr_row_len_with_unfilter_ahead(&self) -> isize { | ||
| // Both are indices into the buffer that is an allocation of bytes. Allocations are at most | ||
| // isize::MAX in size hence this is valid. | ||
| self.available as isize - self.current_start as isize | ||
| } | ||
|
|
||
| /// Returns a `&mut Vec<u8>` suitable for passing to | ||
| /// `ReadDecoder.decode_image_data` or `StreamingDecoder.update`. | ||
| /// | ||
|
|
@@ -132,7 +153,9 @@ impl UnfilteringBuffer { | |
| /// invariants by returning an append-only view of the vector | ||
| /// (`FnMut(&[u8])`??? or maybe `std::io::Write`???). | ||
| pub fn as_unfilled_buffer(&mut self) -> UnfilterBuf<'_> { | ||
| if self.prev_start >= self.shift_back_limit | ||
| let shift_back = self.prev_start.min(self.available); | ||
|
|
||
| if shift_back >= self.shift_back_limit | ||
| // Avoid the shift back if the buffer is still very empty. Consider how we got here: a | ||
| // previous decompression filled the buffer, then we unfiltered, we're now refilling | ||
| // the buffer again. The condition implies, the previous decompression filled at most | ||
|
|
@@ -147,18 +170,21 @@ impl UnfilteringBuffer { | |
| // question if we could be a little smarter and avoid crossing page boundaries when | ||
| // that is not required. Alas, microbenchmarking TBD. | ||
| if let Some(16..) = self.data_stream.len().checked_sub(self.filled) { | ||
| self.data_stream | ||
| .copy_within(self.prev_start..self.filled, 0); | ||
| self.data_stream.copy_within(shift_back..self.filled, 0); | ||
| } else { | ||
| self.data_stream.copy_within(self.prev_start.., 0); | ||
| self.data_stream.copy_within(shift_back.., 0); | ||
| } | ||
|
|
||
| self.debug_assert_invariants(); | ||
|
|
||
| // The data kept its relative position to `filled` which now lands exactly at | ||
| // the distance between prev_start and filled. | ||
| self.current_start -= self.prev_start; | ||
| self.available -= self.prev_start; | ||
| self.filled -= self.prev_start; | ||
| self.prev_start = 0; | ||
| self.current_start -= shift_back; | ||
| self.available -= shift_back; | ||
| self.filled -= shift_back; | ||
| self.prev_start -= shift_back; | ||
|
|
||
| self.debug_assert_invariants(); | ||
| } | ||
|
|
||
| if self.filled + Self::GROWTH_BYTES > self.data_stream.len() { | ||
|
|
@@ -175,17 +201,26 @@ impl UnfilteringBuffer { | |
| /// Runs `unfilter` on the current row, and then shifts rows so that the current row becomes the previous row. | ||
| /// | ||
| /// Will panic if `self.curr_row_len() < rowlen`. | ||
| /// | ||
| /// For correctness this also assumes that `curr_row_len_with_unfilter_ahead` is greater than | ||
| /// `rowlen`. | ||
| pub fn unfilter_curr_row( | ||
| &mut self, | ||
| rowlen: usize, | ||
| bpp: BytesPerPixel, | ||
| ) -> Result<(), DecodingError> { | ||
| debug_assert!(rowlen >= 2); // 1 byte for `FilterType` and at least 1 byte of pixel data. | ||
| debug_assert_eq!(rowlen as isize as usize, rowlen); | ||
| debug_assert!(self.curr_row_len_with_unfilter_ahead() >= rowlen as isize); | ||
|
|
||
| let (prev, row) = self.data_stream.split_at_mut(self.current_start); | ||
| let prev: &[u8] = &prev[self.prev_start..]; | ||
|
|
||
| debug_assert!(prev.is_empty() || prev.len() == (rowlen - 1)); | ||
| let prev = if prev.is_empty() { | ||
| prev | ||
| } else { | ||
| &prev[..rowlen - 1] | ||
| }; | ||
|
|
||
| // Get the filter type. | ||
| let filter = RowFilter::from_u8(row[0]).ok_or(DecodingError::Format( | ||
|
|
@@ -202,6 +237,106 @@ impl UnfilteringBuffer { | |
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| /// Unfilter but allow the current_start to exceed `available` at the cost of some compute. | ||
| /// This will be called when we encounter an `UnexpectedEof` and want to push out all | ||
| /// interlaced rows that we can, i.e. it is not in the usual critical path of decoding. | ||
| pub(crate) fn unfilter_ahead_row( | ||
| &mut self, | ||
| rowlen: usize, | ||
| bpp: BytesPerPixel, | ||
| ) -> Result<bool, DecodingError> { | ||
| if self.filled - self.current_start < rowlen { | ||
| return Ok(false); | ||
| } | ||
|
|
||
| // We can not really mutate the row data to unfilter it! So where do we put the unfiltered | ||
| // row then? In our interface it should occur at `self.prev_start` after this method is | ||
| // finished. So really simple, we just copy it back there. | ||
| // | ||
| // There is of course subtlety. First we need to make sure that the buffer of the previous | ||
| // row does not overlap with the data for the decoder as we will overwrite it. That is | ||
| // usually trivial, when it was previously unfiltered we had already mutated it so just | ||
| // reuse, except that at the first line of each interlace pass we start with an _empty_ | ||
| // previous row and consequently need to potentially move all our data further back. | ||
|
|
||
| // The fixup discussed. Make space for a previous row. It does not matter where we put it | ||
| // so just put it right before the minimum of `current_start` and `available`. In this case | ||
| // however we should also pass an empty row to `unfilter`. | ||
| if self.prev_start == self.current_start { | ||
| let potential_end_of_new_prev = self.current_start.min(self.available); | ||
| // Insert free space between what we treat as the previous row and the current row. | ||
| // NOTE: this is because of the decoder's `commit` function which will take a fixed | ||
| // window of data back from its filled state before it is done. But we move that region | ||
| // upwards so it may erroneously point backwards by more than necessary. That just | ||
| // freezes data but it might also freeze a portion that we are using as the unfilter | ||
| // buffer / the space to put the 'previous row'. | ||
| let padding = rowlen - 1; | ||
|
|
||
| let start_of_new_prev = potential_end_of_new_prev.saturating_sub(padding); | ||
| let end_of_new_prev = start_of_new_prev + padding; | ||
|
|
||
| // Shift everything up as required. | ||
| let current_shift = end_of_new_prev - potential_end_of_new_prev; | ||
|
|
||
| self.data_stream.splice( | ||
| start_of_new_prev..start_of_new_prev, | ||
| core::iter::repeat(0u8).take(current_shift), | ||
| ); | ||
|
|
||
| self.current_start += current_shift; | ||
| // Temporary in case of error. | ||
| self.prev_start = self.current_start; | ||
| self.available += current_shift; | ||
| self.filled += current_shift; | ||
|
|
||
| self.debug_assert_invariants(); | ||
|
|
||
| let (prev, row) = self.data_stream.split_at_mut(self.current_start); | ||
| let prev = &mut prev[start_of_new_prev..][..rowlen - 1]; | ||
|
|
||
| let filter = RowFilter::from_u8(row[0]).ok_or(DecodingError::Format( | ||
| FormatErrorInner::UnknownFilterMethod(row[0]).into(), | ||
| ))?; | ||
|
|
||
| prev.copy_from_slice(&row[1..rowlen]); | ||
| unfilter(filter, bpp, &[], prev); | ||
|
|
||
| self.prev_start = start_of_new_prev; | ||
| self.current_start += rowlen; | ||
| self.debug_assert_invariants(); | ||
|
|
||
| Ok(true) | ||
| } else { | ||
| let (prev, row) = self.data_stream.split_at_mut(self.current_start); | ||
|
|
||
| assert!( | ||
| self.available - self.prev_start >= rowlen - 1, | ||
| "prev {prev}, cur {cur}, avail {avail}, fill {fill}, rowlen {rowlen}", | ||
| prev = self.prev_start, | ||
| cur = self.current_start, | ||
| avail = self.available, | ||
| fill = self.filled, | ||
| ); | ||
|
|
||
| let prev = &mut prev[self.prev_start..][..rowlen - 1]; | ||
|
|
||
| let filter = RowFilter::from_u8(row[0]).ok_or(DecodingError::Format( | ||
| FormatErrorInner::UnknownFilterMethod(row[0]).into(), | ||
| ))?; | ||
|
|
||
| // Unfilter this in a temporary buffer. | ||
| let mut row = row[1..rowlen].to_vec(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: IIUC |
||
| unfilter(filter, bpp, prev, &mut row); | ||
| prev.copy_from_slice(&row); | ||
|
|
||
| // Do NOT modify prev_start | ||
| self.current_start += rowlen; | ||
| self.debug_assert_invariants(); | ||
|
|
||
| Ok(true) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| fn checked_next_multiple_of(val: usize, factor: usize) -> Option<usize> { | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,34 @@ | ||
| use png::{Decoder, DecodingError}; | ||
|
|
||
| #[test] | ||
| fn test_partial_decode() { | ||
| // The first 0x8D bytes from the following test image from Skia: | ||
| // resources/images/apng-test-suite--dispose-ops--none-basic.png | ||
| let partial_png: &[u8] = &[ | ||
| 0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x00, 0x00, 0x0d, 0x49, 0x48, 0x44, | ||
| 0x52, 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x40, 0x08, 0x06, 0x00, 0x00, 0x00, 0xd2, | ||
| 0xd6, 0x7f, 0x7f, 0x00, 0x00, 0x00, 0x08, 0x61, 0x63, 0x54, 0x4c, 0x00, 0x00, 0x00, 0x03, | ||
| 0x00, 0x00, 0x00, 0x01, 0xb9, 0xea, 0x8a, 0x56, 0x00, 0x00, 0x00, 0x1a, 0x66, 0x63, 0x54, | ||
| 0x4c, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, | ||
| 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0a, 0x00, 0x64, 0x00, 0x01, 0x26, 0x12, 0x2f, | ||
| 0xe0, 0x00, 0x00, 0x00, 0x93, 0x49, 0x44, 0x41, 0x54, 0x78, 0x9c, 0xed, 0xd2, 0xa1, 0x01, | ||
| 0x00, 0x30, 0x10, 0x84, 0xb0, 0xdb, 0x7f, 0xe9, 0xef, 0x18, 0x15, 0x44, 0xc4, 0x23, 0xd8, | ||
| 0x6d, 0x47, 0xd7, 0x7e, 0x07, 0x60, 0x00, 0x0c, 0x80, 0x01, 0x30, 0x00, 0x06, 0xc0, 0x00, | ||
| 0x18, 0x00, 0x03, 0x60, 0x00, 0x0c, | ||
| ]; | ||
|
|
||
| let mut reader = Decoder::new(std::io::Cursor::new(partial_png)) | ||
| .read_info() | ||
| .unwrap(); | ||
| let mut row = vec![0; reader.output_buffer_size().unwrap()]; | ||
| for i in 0..10 { | ||
| let result = reader.read_row(&mut row); | ||
| assert!(matches!(result, Ok(_)), "{result:?} at {i}"); | ||
| } | ||
|
|
||
| let result = reader.read_row(&mut row); | ||
| let DecodingError::IoError(err) = result.unwrap_err() else { | ||
| panic!("Unexpected error variant"); | ||
| }; | ||
| assert_eq!(err.kind(), std::io::ErrorKind::UnexpectedEof); | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if instead of reusing/shifting
self.data_streamit may be a bit easier/clearer if a separate buffer (or two) would be introduced as a field ofUnfilteringBuffer? Saycopied_curr_row: Vec<u8>andcopied_prev_row: Option<Vec<u8>>or something like that?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and if we do that, then I guess the type of the
current_startandprev_startfields would need to change fromusizeto something likeEither<usize>(whereEither::Leftmeans the old behavior - data is indata_streamand whereEither::Rightmeans that the rows are in the new buffers).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type change in
current_startwas the main concern that drove me away from that concept. I wanted to keep influence on the main path as low as possible, having anEitherwould be an extra match / conditional read on every single path accessingcurrent_start—even the happy one. I can benchmark it but intuitively, it will be quite visible.