Skip to content
This repository was archived by the owner on Dec 15, 2022. It is now read-only.

Normalize Unicode characters#340

Open
winstliu wants to merge 1 commit into
masterfrom
wl-unicode-normalization
Open

Normalize Unicode characters#340
winstliu wants to merge 1 commit into
masterfrom
wl-unicode-normalization

Conversation

@winstliu

@winstliu winstliu commented Feb 19, 2018

Copy link
Copy Markdown
Contributor

Requirements

  • Filling out the template is required. Any pull request that does not include enough information to be reviewed in a timely manner may be closed at the maintainers' discretion.
  • All new code requires tests to ensure against regressions

Description of the Change

Background: different Unicode sequences can be regarded as equivalent. On Windows and Linux, filenames are converted to NFC, or where characters like ñ are coded as one character. On macOS, filenames are converted to NFD, where ñ is coded as n + ◌̃. Both forms are considered to refer to the same character.

Fuzzy Finder was not taking Unicode normalization into account when searching for files. With this PR, NFD normalization for the filter query is now performed on macOS, while NFC normalization is performed for other platforms. This should allow for proper searching of filenames containing unicode characters.

Alternate Designs

NFKD and NFKC could be used instead, which would mean in addition the above, would also be treated as identical to ff.

Benefits

Fixes edge cases when searching for filenames with unicode characters.

Possible Drawbacks

None.

Applicable Issues

Fixes #69.

/cc @gjtorikian I think this should fix the issue you're seeing. If you'd like to test this PR but don't know how, I can provide guidance.

@winstliu winstliu force-pushed the wl-unicode-normalization branch from 6f24f26 to e925dbc Compare February 19, 2018 16:36
@winstliu

winstliu commented Feb 19, 2018

Copy link
Copy Markdown
Contributor Author

Test plan:

  1. Create a new file called erstiebegrüßung.html.
  2. Open Fuzzy Finder using the fuzzy-finder:toggle-file-finder command.

macOS:

  • Paste in erstiebegrüßung. The newly-added file should appear.
  • Paste in erstiebegrüßung. The newly-added file should appear.

Windows:

  • Paste in erstiebegrüßung. The newly-added file should appear.
  • Paste in erstiebegrüßung. The newly-added file should appear.

Linux:

  • Paste in erstiebegrüßung. The newly-added file should appear.
  • Paste in erstiebegrüßung. The newly-added file should appear.

@lee-dohm

lee-dohm commented Mar 8, 2018

Copy link
Copy Markdown
Contributor

Testing on macOS the first test case passes, the second one fails 😞

@lee-dohm

lee-dohm commented Mar 8, 2018

Copy link
Copy Markdown
Contributor

Hmmmmm ... now I can't get either one to pass 😡

@rsese

rsese commented Mar 8, 2018

Copy link
Copy Markdown

Had a chance to test on Linux (Ubuntu 16.04) - off of this branch, both tests passed for me (off of fuzzy-finder master, the first test passed and the second test failed).

@lee-dohm

lee-dohm commented Mar 9, 2018

Copy link
Copy Markdown
Contributor

Tested this with the help of @50Wliu. Confirmed on Mac OS X 10.13.3 using the APFS file system, both tests fail. According to our research, APFS does not do Unicode file name normalization at all.

@lee-dohm

lee-dohm commented Mar 9, 2018

Copy link
Copy Markdown
Contributor

@gjtorikian Can you confirm which OS version and filesystem you tested this on?

@lee-dohm

lee-dohm commented Mar 9, 2018

Copy link
Copy Markdown
Contributor

Another helpful article on APFS file system normalization.

@gjtorikian

Copy link
Copy Markdown
Contributor

I'm also on 10.13.3 and APFS.

@ghost ghost left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    // Normalize to backslashes on Windows	
    if (process.platform === 'win32') {	

@ghost ghost left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    // Normalize to backslashes on Windows	
    if (process.platform === 'win32') {	

Commit 05b0b51
#340: Normalize Unicode characters

@ghost ghost left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paste in erstiebegrüßung. The newly-added file should appear.
Paste in erstiebegrüßung. The newly-added file should appear.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filenames with UTF-8 characters are not found

4 participants