Skip to content

Conversation

@hansl
Copy link
Contributor

@hansl hansl commented Dec 19, 2025

Slice still has is_latin1 which will be fixed in the next PR.

hansl and others added 13 commits December 14, 2025 20:48
This is the first in a series of PR to improve the performance of strings in Boa.

The first step was to introduce a new type of strings, `SliceString`, which contains a strong pointer to another string and start/end indices. This allows for very fast slicing of strings. This initially came at a performance cost by having an enumeration of kinds of strings. An intermediate experiment was introduced to have the kind be a tag on the internal JsString pointer. This still came as a cost as it required bit operations to figure out which function to call.

Finally, I moved to using a `vtable`. This helped with many points:
1. as fast as before. Before this PR, there was still a deref of a pointer when accessing internal fields.
2. we can now introduce many other types (which will come in their separate PRs).
3. this makes the code to clone/drop/as_str (and even construction) more streamline as each function is their own implementation.
String should never change length anyway, they are immutable in our
current design.
@github-actions
Copy link

Test262 conformance changes

Test result main count PR count difference
Total 52,598 52,598 0
Passed 49,385 49,385 0
Ignored 2,134 2,134 0
Failed 1,079 1,079 0
Panics 0 0 0
Conformance 93.89% 93.89% 0.00%

@codecov
Copy link

codecov bot commented Dec 19, 2025

Codecov Report

❌ Patch coverage is 93.75000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.31%. Comparing base (6ddc2b4) to head (81a52d8).
⚠️ Report is 620 commits behind head on main.

Files with missing lines Patch % Lines
core/string/src/builder.rs 89.28% 3 Missing ⚠️
core/string/src/vtable/sequence.rs 92.50% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #4571       +/-   ##
===========================================
+ Coverage   47.24%   57.31%   +10.06%     
===========================================
  Files         476      509       +33     
  Lines       46892    58028    +11136     
===========================================
+ Hits        22154    33256    +11102     
- Misses      24738    24772       +34     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nekevss nekevss requested a review from a team December 26, 2025 18:04
@nekevss nekevss added the Internal Category for changelog label Dec 26, 2025
Copy link
Member

@nekevss nekevss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks pretty good.

Had a couple nits / things I noticed.

/// Internal trait for crate-specific usage. Contains implementation details
/// that should not leak through the API.
#[allow(private_interfaces)]
pub trait InternalStringType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I don't think this needs to be in the private module either.

It can just be a sealed::Sealed super trait.

A lot of this tends to come from my experience with icu4x / temporal_rs, so I could be a bit biased on the approach towards sealing.

mod private {
    pub trait Sealed {}
}

trait TraitOne : private::Sealed { }

trait TraitTwo: private::Sealed { }

I think this approach is nice because it's very clear that the trait is private and sealed.


impl Sealed for Latin1 {}
impl StringType for Latin1 {
type Char = u8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Shouldn't this be CodePoint rather than char since it's the numeric value vs. instead of the actual char

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a misnomer. This should be what you have when you transform a pointer into a slice. So it's the same as Byte. I need to fix that, let me see if I can do better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Internal Category for changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants