Return byte[]s for non-UTF-8 file paths. #645

stephenh · 2018-09-08T01:43:47Z

The watchman docs note that, due to file system APIs being
encoding-less, we can't assume the bser strings are UTF-8.

This changes the bser string deserializer to fallback to returning
a byte[] when UTF-8 deserialization fails.

This lets client to a cast/instanceof check on String or byte[]
and decide how they want to handle the value.

Fixes #643.

facebook-github-bot · 2018-09-08T01:43:54Z

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign up at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need the corporate CLA signed.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

facebook-github-bot · 2018-09-08T02:20:18Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

mkillianey · 2018-12-20T05:11:26Z

@stephenh - How is this working out for you in practice? Once you have the byte array, are you finding a clear way to do something useful with it from Java, or was that a dead-end? Do you still think it's worth taking this change, or does it cause other headaches elsewhere?

stephenh · 2018-12-21T00:36:15Z

@mkillianey seems good so far, but I am technically checking for "if byte[]" and then skipping those files, and not doing anything with them. This is mostly out of laziness/lack of time to update my app's wire protocol + file system callpaths/etc. to handle "oh right this is a string or a byte[]".

So, no, I don't have a concrete data point that I followed pulled the byte[] thread all the way back into file system APIs and had it work out. (Also the Java file sytems APIs I'm using all take strings, of course, so I'd have to use jnr-poxis equivalents, which I currently do for a few things/special cases like symlinks, but generally try to avoid.)

That said, I did have a user with non-utf-8 paths (based in the UK) that would have triggered my app's previous "blow up w/exception" behavior, so while silently skipping the paths is not necessarily the best right now, it's at least better than failing the entire program.

The watchman docs note that, due to file system APIs being encoding-less, we can't assume the bser strings are UTF-8. This changes the bser string deserializer to fallback to returning a byte[] when UTF-8 deserialization fails. This lets client to a cast/instanceof check on String or byte[] and decide how they want to handle the value.

facebook-github-bot added the CLA Signed label Sep 8, 2018

stephenh force-pushed the return-byte-array-for-non-utf-8 branch 2 times, most recently from 5ab5c24 to 26bdadc Compare September 6, 2019 23:05

Add jitpack.

f63e60a

stephenh force-pushed the return-byte-array-for-non-utf-8 branch from 26bdadc to f63e60a Compare September 6, 2019 23:07

Guava fix.

d01789a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Return byte[]s for non-UTF-8 file paths. #645

Return byte[]s for non-UTF-8 file paths. #645

Uh oh!

stephenh commented Sep 8, 2018

Uh oh!

facebook-github-bot commented Sep 8, 2018

Uh oh!

facebook-github-bot commented Sep 8, 2018

Uh oh!

mkillianey commented Dec 20, 2018

Uh oh!

stephenh commented Dec 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Return byte[]s for non-UTF-8 file paths. #645

Are you sure you want to change the base?

Return byte[]s for non-UTF-8 file paths. #645

Uh oh!

Conversation

stephenh commented Sep 8, 2018

Uh oh!

facebook-github-bot commented Sep 8, 2018

Uh oh!

facebook-github-bot commented Sep 8, 2018

Uh oh!

mkillianey commented Dec 20, 2018

Uh oh!

stephenh commented Dec 21, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants