src: improve StringBytes::Encode perf on UTF8 #61131

ChALkeR · 2025-12-20T00:03:59Z

Tracking: #61041

Most data is valid utf-8, no need to wait for v8 optimizations or for simdutf implementing fast replacement.
We can just check + simdutf in fast case.

This is a 2x-10x speedup according to https://github.com/lemire/jstextdecoderbench bench (+ I added extra cases)

There is still room for improvement here (e.g. avoiding triple scans), but this change alone improves results significantly
We can improve further iteratively
This performs mallocs only for valid strings, instead of optimistically malloc-ing and decoding until error
Switching that behavior to optimistic would be a separate PR (perf needs to be checked against this, not main or #61119)

Buffer#toString() - utf8

pre-#61119:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	18.21 GiB/s	0.005 ms
Arabic lipsum	79.771 KiB	0.29 GiB/s	0.266 ms
Chinese lipsum	68.203 KiB	0.34 GiB/s	0.192 ms
Arabic + 2 * ASCII	249.575 KiB	0.73 GiB/s	0.329 ms

main with #61119 (landed):

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.75 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	0.28 GiB/s	0.273 ms
Chinese lipsum	68.203 KiB	0.33 GiB/s	0.197 ms
Arabic + 2 * ASCII	249.575 KiB	0.69 GiB/s	0.344 ms

PR:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.84 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	2.03 GiB/s	0.038 ms
Chinese lipsum	68.203 KiB	4.06 GiB/s	0.016 ms
Arabic + 2 * ASCII	249.577 KiB	3.42 GiB/s	0.072 ms

TextDecoder, loose

pre-#61119:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	17.99 GiB/s	0.005 ms
Arabic lipsum	79.771 KiB	0.28 GiB/s	0.270 ms
Chinese lipsum	68.203 KiB	0.34 GiB/s	0.194 ms
Arabic + 2 * ASCII	249.577 KiB	0.71 GiB/s	0.333 ms

main with #61119 (landed):

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.59 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	0.28 GiB/s	0.271 ms
Chinese lipsum	68.203 KiB	0.34 GiB/s	0.192 ms
Arabic + 2 * ASCII	249.577 KiB	0.70 GiB/s	0.340 ms

PR:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.78 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	2.03 GiB/s	0.038 ms
Chinese lipsum	68.203 KiB	4.01 GiB/s	0.016 ms
Arabic + 2 * ASCII	249.577 KiB	3.42 GiB/s	0.072 ms

TextDecoder, fatal

pre-#61119:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	15.31 GiB/s	0.006 ms
Arabic lipsum	79.771 KiB	0.27 GiB/s	0.279 ms
Chinese lipsum	68.203 KiB	0.34 GiB/s	0.194 ms
Arabic + 2 * ASCII	249.577 KiB	0.71 GiB/s	0.338 ms

main with #61119 (landed):

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.63 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	0.28 GiB/s	0.272 ms
Chinese lipsum	68.203 KiB	0.33 GiB/s	0.197 ms
Arabic + 2 * ASCII	249.577 KiB	0.68 GiB/s	0.351 ms

PR:

Test	Size	Throughput	Mean Time
Latin lipsum (ASCII)	84.902 KiB	36.71 GiB/s	0.002 ms
Arabic lipsum	79.771 KiB	1.70 GiB/s	0.046 ms
Chinese lipsum	68.203 KiB	2.97 GiB/s	0.022 ms
Arabic + 2 * ASCII	249.577 KiB	3.01 GiB/s	0.082 ms

cc @nodejs/performance

ChALkeR · 2026-01-17T11:01:41Z

As #61119 landed, this is now ready. Rebased.

codecov · 2026-01-17T12:04:14Z

Codecov Report

❌ Patch coverage is 50.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.52%. Comparing base (955d347) to head (d4f0460).
⚠️ Report is 13 commits behind head on main.

Files with missing lines	Patch %	Lines
src/string_bytes.cc	50.00%	4 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #61131      +/-   ##
==========================================
- Coverage   88.52%   88.52%   -0.01%     
==========================================
  Files         704      704              
  Lines      208802   208895      +93     
  Branches    40318    40334      +16     
==========================================
+ Hits       184842   184924      +82     
+ Misses      15947    15940       -7     
- Partials     8013     8031      +18

Files with missing lines	Coverage Δ
src/encoding_binding.cc	`52.73% <ø> (ø)`
src/string_bytes.cc	`69.01% <50.00%> (-0.73%)`	⬇️

... and 45 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

src/string_bytes.cc

Co-authored-by: Gürgün Dayıoğlu <[email protected]>

src/encoding_binding.cc

gurgunday

lgtm

nodejs-github-bot · 2026-01-18T11:10:53Z

CI: https://ci.nodejs.org/job/node-test-pull-request/70868/

nodejs-github-bot · 2026-01-18T20:03:28Z

CI: https://ci.nodejs.org/job/node-test-pull-request/70873/

nodejs-github-bot · 2026-01-19T07:19:49Z

CI: https://ci.nodejs.org/job/node-test-pull-request/70880/

mcollina

lgtm

mertcanaltin

LGTM

nodejs-github-bot added buffer Issues and PRs related to the buffer subsystem. c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Dec 20, 2025

ChALkeR force-pushed the chalker/non-ascii/0 branch 2 times, most recently from 5b2b040 to aee5408 Compare December 20, 2025 05:49

RafaelGSS added the performance Issues and PRs related to the performance of Node.js. label Dec 29, 2025

RafaelGSS self-requested a review December 29, 2025 20:49

ChALkeR force-pushed the chalker/non-ascii/0 branch from aee5408 to 118db5f Compare January 17, 2026 11:01

ChALkeR marked this pull request as ready for review January 17, 2026 11:01

ChALkeR force-pushed the chalker/non-ascii/0 branch from 118db5f to f1d3a0e Compare January 17, 2026 11:06

gurgunday reviewed Jan 17, 2026

View reviewed changes

src/string_bytes.cc Show resolved Hide resolved

gurgunday reviewed Jan 17, 2026

View reviewed changes

src/string_bytes.cc Show resolved Hide resolved

RafaelGSS requested a review from lemire January 17, 2026 18:48

ChALkeR force-pushed the chalker/non-ascii/0 branch from 05e960e to 7a1aaa5 Compare January 18, 2026 02:56

src: improve StringBytes::Encode perf on UTF8

d4f0460

Co-authored-by: Gürgün Dayıoğlu <[email protected]>

ChALkeR force-pushed the chalker/non-ascii/0 branch from 7a1aaa5 to d4f0460 Compare January 18, 2026 03:27

ChALkeR requested a review from gurgunday January 18, 2026 03:28

ChALkeR commented Jan 18, 2026

View reviewed changes

src/encoding_binding.cc Show resolved Hide resolved

gurgunday approved these changes Jan 18, 2026

View reviewed changes

gurgunday added the request-ci Add this label to start a Jenkins CI on a PR. label Jan 18, 2026

github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Jan 18, 2026

gurgunday added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 18, 2026

Qard approved these changes Jan 19, 2026

View reviewed changes

mcollina approved these changes Jan 19, 2026

View reviewed changes

mertcanaltin approved these changes Jan 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

src: improve StringBytes::Encode perf on UTF8 #61131

src: improve StringBytes::Encode perf on UTF8 #61131

ChALkeR commented Dec 20, 2025 •

edited

Loading

Uh oh!

ChALkeR commented Jan 17, 2026

Uh oh!

codecov bot commented Jan 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gurgunday left a comment

Uh oh!

nodejs-github-bot commented Jan 18, 2026

Uh oh!

nodejs-github-bot commented Jan 18, 2026

Uh oh!

nodejs-github-bot commented Jan 19, 2026

Uh oh!

mcollina left a comment

Uh oh!

mertcanaltin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Uh oh!

src: improve StringBytes::Encode perf on UTF8 #61131

Are you sure you want to change the base?

src: improve StringBytes::Encode perf on UTF8 #61131

Conversation

ChALkeR commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Buffer#toString() - utf8

TextDecoder, loose

TextDecoder, fatal

Uh oh!

ChALkeR commented Jan 17, 2026

Uh oh!

codecov bot commented Jan 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gurgunday left a comment

Choose a reason for hiding this comment

Uh oh!

nodejs-github-bot commented Jan 18, 2026

Uh oh!

nodejs-github-bot commented Jan 18, 2026

Uh oh!

nodejs-github-bot commented Jan 19, 2026

Uh oh!

mcollina left a comment

Choose a reason for hiding this comment

Uh oh!

mertcanaltin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ChALkeR commented Dec 20, 2025 •

edited

Loading

codecov bot commented Jan 17, 2026 •

edited

Loading