Skip to content

Conversation

@stephenh
Copy link

@stephenh stephenh commented Sep 8, 2018

The watchman docs note that, due to file system APIs being
encoding-less, we can't assume the bser strings are UTF-8.

This changes the bser string deserializer to fallback to returning
a byte[] when UTF-8 deserialization fails.

This lets client to a cast/instanceof check on String or byte[]
and decide how they want to handle the value.

Fixes #643.

@facebook-github-bot
Copy link
Contributor

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link
Contributor

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@mkillianey
Copy link

@stephenh - How is this working out for you in practice? Once you have the byte array, are you finding a clear way to do something useful with it from Java, or was that a dead-end? Do you still think it's worth taking this change, or does it cause other headaches elsewhere?

@stephenh
Copy link
Author

@mkillianey seems good so far, but I am technically checking for "if byte[]" and then skipping those files, and not doing anything with them. This is mostly out of laziness/lack of time to update my app's wire protocol + file system callpaths/etc. to handle "oh right this is a string or a byte[]".

So, no, I don't have a concrete data point that I followed pulled the byte[] thread all the way back into file system APIs and had it work out. (Also the Java file sytems APIs I'm using all take strings, of course, so I'd have to use jnr-poxis equivalents, which I currently do for a few things/special cases like symlinks, but generally try to avoid.)

That said, I did have a user with non-utf-8 paths (based in the UK) that would have triggered my app's previous "blow up w/exception" behavior, so while silently skipping the paths is not necessarily the best right now, it's at least better than failing the entire program.

The watchman docs note that, due to file system APIs being
encoding-less, we can't assume the bser strings are UTF-8.

This changes the bser string deserializer to fallback to returning
a byte[] when UTF-8 deserialization fails.

This lets client to a cast/instanceof check on String or byte[]
and decide how they want to handle the value.
@stephenh stephenh force-pushed the return-byte-array-for-non-utf-8 branch 2 times, most recently from 5ab5c24 to 26bdadc Compare September 6, 2019 23:05
@stephenh stephenh force-pushed the return-byte-array-for-non-utf-8 branch from 26bdadc to f63e60a Compare September 6, 2019 23:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants