-
-
Notifications
You must be signed in to change notification settings - Fork 480
Split SequenceString into two types for Latin1 and Utf16 #4571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is the first in a series of PR to improve the performance of strings in Boa. The first step was to introduce a new type of strings, `SliceString`, which contains a strong pointer to another string and start/end indices. This allows for very fast slicing of strings. This initially came at a performance cost by having an enumeration of kinds of strings. An intermediate experiment was introduced to have the kind be a tag on the internal JsString pointer. This still came as a cost as it required bit operations to figure out which function to call. Finally, I moved to using a `vtable`. This helped with many points: 1. as fast as before. Before this PR, there was still a deref of a pointer when accessing internal fields. 2. we can now introduce many other types (which will come in their separate PRs). 3. this makes the code to clone/drop/as_str (and even construction) more streamline as each function is their own implementation.
String should never change length anyway, they are immutable in our current design.
Test262 conformance changes
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4571 +/- ##
===========================================
+ Coverage 47.24% 57.31% +10.06%
===========================================
Files 476 509 +33
Lines 46892 58028 +11136
===========================================
+ Hits 22154 33256 +11102
- Misses 24738 24772 +34 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
nekevss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks pretty good.
Had a couple nits / things I noticed.
| /// Internal trait for crate-specific usage. Contains implementation details | ||
| /// that should not leak through the API. | ||
| #[allow(private_interfaces)] | ||
| pub trait InternalStringType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: I don't think this needs to be in the private module either.
It can just be a sealed::Sealed super trait.
A lot of this tends to come from my experience with icu4x / temporal_rs, so I could be a bit biased on the approach towards sealing.
mod private {
pub trait Sealed {}
}
trait TraitOne : private::Sealed { }
trait TraitTwo: private::Sealed { }I think this approach is nice because it's very clear that the trait is private and sealed.
|
|
||
| impl Sealed for Latin1 {} | ||
| impl StringType for Latin1 { | ||
| type Char = u8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Shouldn't this be CodePoint rather than char since it's the numeric value vs. instead of the actual char
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of a misnomer. This should be what you have when you transform a pointer into a slice. So it's the same as Byte. I need to fix that, let me see if I can do better.
Slice still has
is_latin1which will be fixed in the next PR.