Skip to content

Conversation

@garretrieger
Copy link
Contributor

No description provided.

@skef
Copy link
Contributor

skef commented Feb 15, 2025

In my opinion it's much too early to merge this.

@garretrieger
Copy link
Contributor Author

In my opinion it's much too early to merge this.

This is not intended to be a final take on the subject, but rather just capture how the current prototype code works so I can share and gather feedback. With that in mind I added a status section that explains this is all still in early stages of development.

@garretrieger garretrieger force-pushed the glyph_segmentation_doc branch from cea8821 to adcb236 Compare February 15, 2025 01:38
@skef
Copy link
Contributor

skef commented Feb 15, 2025

It's premature to be gathering input on this outside of the working group, and the working group can access this document in other ways.

This draft goes into great detail about one particular approach to forming glyph-keyed patches while entirely omitting other relevant subjects. It's far too early to merge this.

@garretrieger garretrieger force-pushed the glyph_segmentation_doc branch from adcb236 to c672b5c Compare February 18, 2025 20:00
@garretrieger
Copy link
Contributor Author

It's premature to be gathering input on this outside of the working group, and the working group can access this document in other ways.

This proposal is not related to the specification. It's describing one possible implementation of part of spec compliant encoder. The encoder implementation is not part of the working groups deliverables, so I don't see any reason to keep it within the confines of working group discussions. In this case I'm specifically looking to gather feedback from outside the working group.

This draft goes into great detail about one particular approach to forming glyph-keyed patches while entirely omitting other relevant subjects. It's far too early to merge this.

That's right this document is meant to describe how the particular approach I'm currently investigating in glyph_segmentation.cc functions. It's not the intention to present this as the only possible or a complete solution. The status section I added describes this, mentions some of the missing pieces, and also talks about possible other approaches. I added some additional tweaks to the wording to make this more clear.

@skef
Copy link
Contributor

skef commented Feb 18, 2025

I don't think something like this is necessary for the upcoming review and in its present form I worry it could raise more questions than it answers.

This is not intended to be a final take on the subject, but rather just capture how the current prototype code works so I can share and gather feedback.

My understanding of the spec development process is that we share and gather feedback with the working group first, and then when that process has been worked through we share with the wider W3C audience. @vlevantovsky is that your understanding as well?

@garretrieger
Copy link
Contributor Author

Just to reiterate, this document is not part of the specification and is not proposing any changes to the specification. This repo and the proposal here is for an IFT encoder implementation. An encoder implementation is not a deliverable of the web fonts working group (see section 3 in the charter here: https://www.w3.org/2023/02/webfonts-2023.html), so I don't believe any of this is in the scope of the working group.

@skef
Copy link
Contributor

skef commented Feb 18, 2025

Oh, sorry, I thought this was somehow related to the upcoming release of the specification.

In that case, can you give an idea of who the intended audience is?

@skef
Copy link
Contributor

skef commented Feb 18, 2025

(This is still a W3C repository closely related to the spec, and I don't want there to be confusion on the part of spec reviewers about this approach.)

@garretrieger
Copy link
Contributor Author

garretrieger commented Feb 18, 2025

Oh, sorry, I thought this was somehow related to the upcoming release of the specification.

In that case, can you give an idea of who the intended audience is?

Two main groups:

  1. Anyone who happens to be reading/developing the code in glyph_segmentation.cc and would like context on the problem being solved, the basis for why the implementation does what it does, and a summary of the overall segmentation process.
  2. Engineering peers in the font tooling space who I'd like to gather feedback from on the procedure. At this early stage I'm mostly looking to validate that the assumptions all of this rests on are valid.

(This is still a W3C repository closely related to the spec, and I don't want there to be confusion on the part of spec reviewers about this approach.)

Yeah, agreed. The current setup somewhat just an artifact of the history of this repo. It was originally closely tied to the analysis suite which was very much relevant to the working group, but its since been renamed a few times now and the scope has been significantly narrowed to be just an encoder implementation. It's probably worth a discussion if it makes sense to move this repo outside of the w3c org, for example it might make sense to move it under the fontations group of projects since the client already lives over there.

@skef
Copy link
Contributor

skef commented Feb 18, 2025

It sounds like Google considers this code to be their own project now. That hasn't been the basis on which Adobe was participating, but I suppose that was my confusion about its status given where the repository is stored.

So, yes, if this is a Google project I think it would be best to move it outside of the W3C umbrella and then Google can add whatever documents to it that they would like.

@garretrieger
Copy link
Contributor Author

garretrieger commented Feb 19, 2025

I wasn't implying this is a google owned project. I've also been operating under the assumption that it is a collaborative effort. All I was implying above is that we may wan't to consider discussing if this is still the best place for the repo to live. If a decision is made to move it that should be made as a group. Likewise if we all agree this is still the best place for it, then I don't see any issues leaving it here either.

Back to the original comment, I'm not worried about spec reviewers seeing this and getting confused. This repo and document aren't linked anywhere from the specification. Additionally, this repo and this document are both clearly labelled as being an implementation of the specification.

@skef
Copy link
Contributor

skef commented Feb 19, 2025

I wasn't implying this is a google owned project. I've also been operating under the assumption that it is a collaborative effort. All I was implying above is that we may wan't to consider discussing if this is still the best place for the repo to live. If a decision is made to move it that should be made as a group.

As what group? You said yourself that this project isn't a deliverable of the working group. And all of this is clearly in the context of whether my views about this document are relevant to merging it.

@garretrieger
Copy link
Contributor Author

I wasn't implying this is a google owned project. I've also been operating under the assumption that it is a collaborative effort. All I was implying above is that we may wan't to consider discussing if this is still the best place for the repo to live. If a decision is made to move it that should be made as a group.

As what group? You said yourself that this project isn't a deliverable of the working group. And all of this is clearly in the context of whether my views about this document are relevant to merging it.

At present that's pretty much just me and you who are actively involved in the encoder implementation so that's likely a conversation between us. If you're happy with the repo living here, then that works for me too.

With the clarification that this document isn't tied to the specification, do you still have objections to merging this?

@skef
Copy link
Contributor

skef commented Feb 19, 2025

With the clarification that this document isn't tied to the specification, do you still have objections to merging this?

Yes, because the repo being under w3c creates ambiguity about the status of such documents.

@svgeesus
Copy link
Contributor

This draft goes into great detail about one particular approach to forming glyph-keyed patches while entirely omitting other relevant subjects. It's far too early to merge this.

I don't understand. It is documentation of what this particular implementation does, not a constraint on what all implementations must do (which would belong in the specification).

@skef
Copy link
Contributor

skef commented Feb 19, 2025

@svgeesus If this repo contains a Google project, then I don't disagree, but I think it would be better to move it out from under the w3c hierarchy.

If this is a project related to the working group, or a subset of the working group, then Garret and I have already discussed various issues with this approach and at the time he agreed what we actually do wouldn't take this form, because of those issues. Now he wants "feedback" on this approach, with some vague language about it not being "final", that doesn't take our concerns into account.

This is part of a trend of this project moving in the direction of "this is a shared project when that works for Google and a Google project when it doesn't". Until there is more clarity about that I would prefer this document isn't merged under the w3c heading.

@skef
Copy link
Contributor

skef commented Feb 21, 2025

I have trouble reconciling the algorithm described in the document with the explanation of its role.

The introduction says:

The remainder of this document describes a potential procedure for analyzing a font to determine how to split glyphs across a set of patches and ...

This seems like an exaggeration at best relative to the first goal:

Given a desired segmentation of the font expressed in terms of Unicode code points find a corresponding grouping of glyphs to be made into a set of patches.

Surely almost all of the work concerning "how to split glyphs across a set of patches" is in the "desired segmentation", given how little the algorithm appears to change that grouping in light of what it finds.

From the intro again:

The current prototype implementation in glyph_segmentation.cc can produce segmentations that satisfy the closure requirement, but does not yet necessarily produce ones that are performant.

This makes it sound as if this algorithm could eventually be extended to "produce" performant segmentations, even though it seems to have little to do with that more general problem.

If the idea is that we want to gather suggestions on how to make this algorithm produce performance segmentations, I would think it should contain some guidance about that, given that how it operates currently is so independent of that.

If, on the other hand, the goal of this algorithm is transform a "rough draft" of a segmentation (whether performant or not) into a "final draft" that meets the closure requirement, the document should characterize it that way.

(There are of course many possible intermediate points between an algorithm that solves the whole problem and an algorithm that does a "tune up" on the output of some earlier algorithm. The IFTB prototype encoder took the equivalent of unicode "streams" as input -- groups of codepoints, some ordered and some not, that the encoder tried to keep more or less adjacent but would move around as needed to better optimize for dependencies. Maybe something like this algorithm could be extended that way but it's far from obvious (to me) how it would be extended that way given the inputs it takes and what it does.)

@skef
Copy link
Contributor

skef commented Feb 21, 2025

I'm not sure how this case is addressed:

  • Glyph $x$ has unicode mapping $X$ which is in segment 1
  • Glyph $y$ has unicode mapping $Y$ which is in segment 4
  • Glyph $z$ has unicode mapping $Z$ which is in segment 6
  • There is a ligature substitution from $y$ $z$ to $x$

From what I can tell:

  • $x \in I_1$ and $x \notin D_1$
  • $x \notin I_4$ and $x \notin D_4$
  • $x \notin I_6$ and $x \notin D_6$

$x \in (I_1 - D_1)$ is interpreted as "the activation condition for [$x$] is ($s_i ∧ …)$ where ... is one or more additional segments". But:

  1. That seems false, as $x$ is needed with segment 1 regardless, and
  2. There doesn't seem to be a good means of learning about where else $x$ is needed

If this is indeed a gap then the algorithm doesn't meet the stated goal of ensuring the segmentation meets the closure requirement.

Given:

* An input font, $F$.
* A list of segments $s_0, …, s_n$. Where:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "Goals" above it says the inputs are desired segmentations of Unicode codepoints. I don't see why a desired segmentation would put the same codepoint in two segments, so I don't see the purpose of $s_i \neq s_0$ Why not simply specify that the segments must be disjoint?

Or, if the idea is that the previous stage has done some optimization work, such that it is addressing certain desired duplication cases, why are the input segments in terms of codepoints rather than glyphs (or both)? What if the the earlier stage already has a good idea of where to put unmapped glyphs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_0 is a special segment. It's the set of codepoints and associated glyphs placed the initial font. The s_i != s_0 is just saying this procedure doesn't apply to the init segment (s_0) since it's considered to be already available and doesn't need patches or conditions generated for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand that but it raises additional questions:

Is s_0 a set of codepoints (as the text seems to indicate) or a set of codepoints and associated glyphs? If the latter is it a requirement that the earlier stage determine the dependencies for s_0 and include them? It doesn't seem that there is any s_0-specific analysis in the algorithm and therefore it's unclear how it's ensured that s_0 contains its dependencies.

Or is the idea to treat any such dependencies as exceptions in the algorithm?

@svgeesus
Copy link
Contributor

@skef thanks for refining your position from "do not merge" to specific, actionable questions and criticisms. That is helpful.

@garretrieger could you address these points before merging? Either by correcting the algorithm or by adding clarifying prose about the capabilities and the specific problem being solved?

I also suggest making it clearer where this sample code represents one implementation approach, not the only one.

@garretrieger
Copy link
Contributor Author

If this is a project related to the working group, or a subset of the working group, then Garret and I have already discussed various issues with this approach and at the time he agreed what we actually do wouldn't take this form, because of those issues. Now he wants "feedback" on this approach, with some vague language about it not being "final", that doesn't take our concerns into account.

This document and the accompanying code are purely exploratory. I'm just looking to try out this approach and see if it's workable in practice, and at the same time gather some feedback from experts in the font space on it. The document doesn't do a good enough job communicating that, I'll fix that. It's not in anyway meant to be a complete, final, or only solution.

@garretrieger
Copy link
Contributor Author

If, on the other hand, the goal of this algorithm is transform a "rough draft" of a segmentation (whether performant or not) into a "final draft" that meets the closure requirement, the document should characterize it that way.

Yeah, this is what the intent of this specific algorithm is. The idea is that something before it produces a high quality unicode based segmentation (ideally taking into account interactions between codepoints) and this procedure turns that into the actual list of patches and activation conditions for those patches such that the closure requirement is satisfied.

I've rewritten parts of the introduction to make it more clear what the scope of this procedure is.

@garretrieger
Copy link
Contributor Author

I'm not sure how this case is addressed:

  • Glyph
    x
    has unicode mapping
    X
    which is in segment 1

  • Glyph
    y
    has unicode mapping
    Y
    which is in segment 4

  • Glyph
    z
    has unicode mapping
    Z
    which is in segment 6

  • There is a ligature substitution from
    y

    z
    to
    x

From what I can tell:

  • x

    I
    1
    

    and
    x

    D
    1
    
  • x

    I
    4
    

    and
    x

    D
    4
    
  • x

    I
    6
    

    and
    x

    D
    6
    

x ∈ ( I 1 − D 1 ) is interpreted as "the activation condition for [$x$] is ( s i ∧ … ) where ... is one or more additional segments". But:

  1. That seems false, as
    x
    is needed with segment 1 regardless, and
  2. There doesn't seem to be a good means of learning about where else
    x
    is needed

If this is indeed a gap then the algorithm doesn't meet the stated goal of ensuring the segmentation meets the closure requirement.

I'm not sure how this case is addressed:

  • Glyph
    x
    has unicode mapping
    X
    which is in segment 1

  • Glyph
    y
    has unicode mapping
    Y
    which is in segment 4

  • Glyph
    z
    has unicode mapping
    Z
    which is in segment 6

  • There is a ligature substitution from
    y

    z
    to
    x

From what I can tell:

  • x

    I
    1
    and
    x

    D
    1
  • x

    I
    4
    and
    x

    D
    4
  • x

    I
    6
    and
    x

    D
    6

x ∈ ( I 1 − D 1 ) is interpreted as "the activation condition for [$x$] is ( s i ∧ … ) where ... is one or more additional segments". But:

  1. That seems false, as
    x
    is needed with segment 1 regardless, and
  2. There doesn't seem to be a good means of learning about where else
    x
    is needed

If this is indeed a gap then the algorithm doesn't meet the stated goal of ensuring the segmentation meets the closure requirement.

This case is handled by the "additional conditions" check. Basically once we construct a disjunctive group, it's possible that there may be remaining conditions which are not yet discovered involving two or more segments simultaneously.

Here's the relevant text:

"Otherwise if the set is disjunctive, then the glyph is needed when at least one segment in the set is present or possibly due to some additional undiscovered conditions. These potential undiscovered conditions are important, and later on the process will attempt to rule them out."

"Compute the Segment Closure Analysis for a new composite segment $\bigcup s_i \notin C$. Remove any glyphs from the group that are found in $I - D$. These glyphs may appear as a result of additional conditions and so need to be considered unmapped. If the group has no remaining glyphs don’t make a patch for it.."

In this particular case, {x} will be found in I - D after testing s_4 U s_6 according to that procedure. If multi segment analysis is not performed then glyph x will be placed in the fallback patch whose load is triggered on the presence of any input segment (so s_1 OR .. OR s_n in this case).

If multi segment analysis is performed then the correct condition will be discovered once the combination (s_4 U s_6) is tested. This is the sort of case where a good quality input unicode segmentation can really help out the creation of the glyph segments by placing x, y, z in a single segment together.

@skef
Copy link
Contributor

skef commented Feb 25, 2025

I see better how that case is handled now.

@skef
Copy link
Contributor

skef commented Feb 25, 2025

I'm stopping my review of this tonight before the merging section.

My general impressions of the segmentation analysis:

  • The identification of the conjunctive sets provides a modest improvement over the IFTB prototype, in that some cases where a ligature was lumped in with a single source glyph will now be converted to a requirement on two or more glyphs. I believe that the prototype added it to the lower-sized mini-bin but the better heuristic would be to add it to the lower-frequency mini-bin. This basically trades off a slightly larger patch map for a slightly lower probability of loading the mapped patch. So: modest improvement.
  • The whole approach to handling disjunctive sets continues to strike me as backward. Yes, you can avoid duplication by putting something in its own patch and expressing the full set of dependencies on that patch. But then you absorb the increased cost of expressing those dependencies in the patch map plus the overhead of the glyph-keyed patch headers. And if at least one of the "source" segments is high-frequency you've gained basically nothing over keeping it there. (Merging can presumably mitigate this somewhat, although I find the first section on merging so confusing that I'll have to look at the code to figure out what it's trying to say.)
  • I continue to fail to see a larger system this algorithm would fit into. As stated its performance relies on clever arrangement of the source segments. So, for example, if fi is a substitute for "f i" then it's better to have f and i in the same segment. However, any analysis that would tell you that f and i belong in the same segment would do so in virtue of the dependency through fi. So what feeds into this algorithm would already know a lot of the dependency information. Why, then, would we throw out much of what it does know and revert solely to codepoints in the next stage? Doing so will, most importantly, greatly increase the number of glyphs that wind up in the fallback patch due to loss of information about how to deal with those glyphs. At a minimum I would think such a system needs to provide the layer before it a means the system before it a means of encoding what dependencies it thinks it wants to consider handled versus those it wants this layer to deal with, so that this layer makes the changes it sees fit in the latter but only verifies the work on the former.

are commonly used together. This document and the implementation make no attempt to solve this
problem yet.

* Patch merging: a very basic patch merging process has been included in the implementation but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bullet sort of comes out of the blue and needs more explanation. What is patch merging? Why might it be useful? You get into more detail below but at this point it's not clear what the topic is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bit of colour here on what merging is solving.

Given:

* An input font, $F$.
* A list of segments $s_0, …, s_n$. Where:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand that but it raises additional questions:

Is s_0 a set of codepoints (as the text seems to indicate) or a set of codepoints and associated glyphs? If the latter is it a requirement that the earlier stage determine the dependencies for s_0 and include them? It doesn't seem that there is any s_0-specific analysis in the algorithm and therefore it's unclear how it's ensured that s_0 contains its dependencies.

Or is the idea to treat any such dependencies as exceptions in the algorithm?

group that are found in $I - D$. These glyphs may appear as a result of additional conditions and so
need to be considered unmapped. If the group has no remaining glyphs don’t make a patch for it.

4. Lastly, collect the set of glyphs that are not part of any patch formed in steps 1 through 3. These form a fallback patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is one of the things that doesn't seem to follow if there is no analysis for $s_0$. Couldn't there also be a dependency from a codepoint in $s_0$?

  2. What is the advantage of having a patch that is loaded if any other patch is loaded versus just putting these glyphs in $s_0$? Seems like a false economy at best.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Dependencies on s0 are handled in the computation of the I set, which is defined as closure(s0 U si) - closure(s0). So any glyphs reachable via the combination of the initial font and si will be included into the si patch.

  2. Yes, you could alternatively place these glyphs into the initial font. I'll update the text to mention that as the preferred option. The case that you'd want them in a fallback patch is if you're trying to form a desiccated initial font that contains no glyph data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a small section discussing the decision to place the fallback patch in the initial font or keep it separate.


1. If the associated exclusive patch does not exist or it is larger than the minimum size skip this segment.

2. Locate one or more segments to merge:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title of this section is "Merging Exclusive Patches" but the language in b seems to indicate that it's more like "Merging Into Exclusive Patches*. Otherwise it's confusing why we're looking at patches that mention multiple segments at all.

If we are looking at those, what is the "segment frequency" of a conditional patch? Is it maybe equivalent to the lowest frequency individual segment for conjunctive patches but the highest frequency individual segment for disjunctive patches? Is this outlined anywhere?

In any case I'm confused by these heuristics. Why sort first on the number of segments? Why are we apparently disregarding whether a patch is conjunctive or disjunctive, isn't that quite relevant to merging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is merging exclusive patches in the sense that we're combining input segments together to form new input segments, but agreed its not quite the right name for it since the result of merging affects both exclusive and conditional patches.

This heuristic is an extremely simplistic first pass which is certainly not optimal. The logic behind the current version is this:

  • When finding things to merge we want to merge as few segments as possible, so examine possibilities from smallest to largest number of segments.
  • Then follow up by using frequency since we want to aim to merge higher frequencies things together.

This of course captures none of the nuance of how frequency, patch conditions, and patch size interact. So
I'm working on a more advanced version of this that will use a cost function and will numerically incorporate patch size, frequency, and the conjunctive/disjunctive nature of the conditions. For a first pass though this current approach still produces reasonable results based on testing on real fonts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, the section.

@garretrieger
Copy link
Contributor Author

Is s_0 a set of codepoints (as the text seems to indicate) or a set of codepoints and associated glyphs? If the latter is it a requirement that the earlier stage determine the dependencies for s_0 and include them? It doesn't seem that there is any s_0-specific analysis in the algorithm and therefore it's unclear how it's ensured that s_0 contains its dependencies.

Or is the idea to treat any such dependencies as exceptions in the algorithm?

Set s0 will be just codepoints like the other segments. The computation of set I for each si incoporates s0 and so it captures dependency information involving s0.

@garretrieger
Copy link
Contributor Author

I'm stopping my review of this tonight before the merging section.

My general impressions of the segmentation analysis:

  • The identification of the conjunctive sets provides a modest improvement over the IFTB prototype, in that some cases where a ligature was lumped in with a single source glyph will now be converted to a requirement on two or more glyphs. I believe that the prototype added it to the lower-sized mini-bin but the better heuristic would be to add it to the lower-frequency mini-bin. This basically trades off a slightly larger patch map for a slightly lower probability of loading the mapped patch. So: modest improvement.
  • The whole approach to handling disjunctive sets continues to strike me as backward. Yes, you can avoid duplication by putting something in its own patch and expressing the full set of dependencies on that patch. But then you absorb the increased cost of expressing those dependencies in the patch map plus the overhead of the glyph-keyed patch headers. And if at least one of the "source" segments is high-frequency you've gained basically nothing over keeping it there. (Merging can presumably mitigate this somewhat, although I find the first section on merging so confusing that I'll have to look at the code to figure out what it's trying to say.)

I've been testing the implementation on real fonts and the overall overhead introduced by this approach actually ends up being quite low. For example segmenting Noto Nastaliq Urdu (which is on the higher end of gsub complexity we'd expect to encounter and includes quite a few conditional patches) on a basic code point segmentation, only introduces 5% extra overhead versus an "ideal" segmentation (one that has exactly one patch per input segment). That's with 0 glyph duplication.

Looking forward though, the plan is to have the conditional patch merging process address this. This is an area that isn't fleshed out much yet, but my plan is to develop a heuristic based on a cost function which can decide on a patch by patch basis whether it's more efficient to keep glyphs in separate patches, merge, or duplicate them. This would take into account patch sizes, frequency, and activation conditions.

  • I continue to fail to see a larger system this algorithm would fit into. As stated its performance relies on clever arrangement of the source segments. So, for example, if fi is a substitute for "f i" then it's better to have f and i in the same segment. However, any analysis that would tell you that f and i belong in the same segment would do so in virtue of the dependency through fi. So what feeds into this algorithm would already know a lot of the dependency information. Why, then, would we throw out much of what it does know and revert solely to codepoints in the next stage? Doing so will, most importantly, greatly increase the number of glyphs that wind up in the fallback patch due to loss of information about how to deal with those glyphs. At a minimum I would think such a system needs to provide the layer before it a means the system before it a means of encoding what dependencies it thinks it wants to consider handled versus those it wants this layer to deal with, so that this layer makes the changes it sees fit in the latter but only verifies the work on the former.

The value in having this layer as part of the process is the piece that's making decisions about code point grouping can intentionally opt to ignore dependencies and then trust that this finalizing layer will sort it out. For example if you have a dep between something that's really high frequency and something low frequency you may want to intentionally keep those in separate code point segments. In the same vein for fonts with extremely complex GSUB interactions the codepoint grouping analysis can give up trying to sort out particularly complex interactions and just let the finalizing layer handle it.

This isn't discussed much yet, but it would definitely be valuable for the input to this to include dependency information discovered during code point segmentation. Dependency information can be utilized in a few places to help guide decisions. Particularly in the optional multi segment analysis phase where it could be used to significantly prune the amount of combinations to test. For now I've updated the status section to note this is an area for future development.

@garretrieger garretrieger force-pushed the glyph_segmentation_doc branch 2 times, most recently from f75a56d to 7b7518e Compare March 4, 2025 22:06
@garretrieger
Copy link
Contributor Author

@skef as discussed I've moved this document to an "experimental" sub folder and added a README to that folder which clarifies the contained docs are incomplete and not fully reviewed.

@skef
Copy link
Contributor

skef commented Mar 4, 2025

@garretrieger I tuned the language of the README a bit

@skef
Copy link
Contributor

skef commented Mar 4, 2025

Anyway, with those README changes I am fine with this document being merged prior to my having the resources to properly review it on Adobe's behalf.

@garretrieger
Copy link
Contributor Author

Anyway, with those README changes I am fine with this document being merged prior to my having the resources to properly review it on Adobe's behalf.

README changes lgtm, I'll get this rebased and then merged.

@garretrieger garretrieger force-pushed the glyph_segmentation_doc branch from 511a9d0 to 21a810e Compare March 5, 2025 19:49
@garretrieger garretrieger merged commit 769a8c8 into w3c:main Mar 5, 2025
3 checks passed
@garretrieger garretrieger deleted the glyph_segmentation_doc branch March 27, 2025 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants