-
-
Notifications
You must be signed in to change notification settings - Fork 34.4k
src: improve StringBytes::Encode perf on UTF8 #61131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5b2b040 to
aee5408
Compare
aee5408 to
118db5f
Compare
|
As #61119 landed, this is now ready. Rebased. |
118db5f to
f1d3a0e
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #61131 +/- ##
==========================================
- Coverage 88.52% 88.52% -0.01%
==========================================
Files 704 704
Lines 208802 208895 +93
Branches 40318 40334 +16
==========================================
+ Hits 184842 184924 +82
+ Misses 15947 15940 -7
- Partials 8013 8031 +18
🚀 New features to boost your workflow:
|
05e960e to
7a1aaa5
Compare
Co-authored-by: Gürgün Dayıoğlu <[email protected]>
7a1aaa5 to
d4f0460
Compare
gurgunday
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
mcollina
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
mertcanaltin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Tracking: #61041
Most data is valid utf-8, no need to wait for v8 optimizations or for simdutf implementing fast replacement.
We can just check + simdutf in fast case.
This is a 2x-10x speedup according to https://github.com/lemire/jstextdecoderbench bench (+ I added extra cases)
There is still room for improvement here (e.g. avoiding triple scans), but this change alone improves results significantly
We can improve further iteratively
This performs mallocs only for valid strings, instead of optimistically malloc-ing and decoding until error
Switching that behavior to optimistic would be a separate PR (perf needs to be checked against this, not main or #61119)
Buffer#toString() - utf8
pre-#61119:
main with #61119 (landed):
PR:
TextDecoder, loose
pre-#61119:
main with #61119 (landed):
PR:
TextDecoder, fatal
pre-#61119:
main with #61119 (landed):
PR:
cc @nodejs/performance