Skip to content

Conversation

@jmdyck
Copy link
Contributor

@jmdyck jmdyck commented Jun 21, 2025

(I'm creating this in draft mode, because at this stage it's more of a discussion-starter than a solid proposal.)

In issue #10483, @domenic raises the possibility of wrapping the HTML spec's algorithms in <div algorithm> (as in Bikeshed specs). This PR does something approximating that.

The first commit is a minor markup change that was originally in PR #11379, but fell out when I withdrew the <hN>-related commits.

The second commit just adds some linebreaks, so that the main commit has a cleaner diff.

In the main commit, I add 2269 <div>...</div> pairs.

Each <div> start tag has one or both attributes:

  • var-scope
  • algo="..."

(11 have only var-scope, 1081 have only algo, 1177 have both.)

var-scope means: this div is a scope for <var> elements. (The main commit also inserts the var-scope attribute into 4 <p> tags and 1 <dd> tag, rather than introduce a <div> to hold the var-scope.)

algo means: this div contains some kind of algorithm, where the value of the attribute suggests the kind. (Occasionally, the value will show 2 kinds.)

Behavior for attributes and operations declared in Web IDL fragments:

    340  "idl:attr:regular/get"
     76  "idl:attr:regular/set"
     57  "idl:attr:regular/get,idl:attr:regular/set"
    184  "idl:attr:regular/reflect"
      1  "idl:attr:regular/act-like"
     10  "idl:attr:reflected/get"
      9  "idl:attr:reflected/set"
      1  "idl:attr:event-handler/get"
      1  "idl:attr:event-handler/set"
    238  "idl:op:regular"
     14  "idl:op:constructor"
      2  "idl:op:special:deleter:for-named-property"
     11  "idl:op:special:getter:for-indexed-property"
     10  "idl:op:special:getter:for-named-property"
      2  "idl:op:special:setter:for-indexed-property"
      2  "idl:op:special:setter:for-named-property"
      2  "idl:op:static"

The classic: an algorithm is defined with a unique name + ID:

    642  "define-procedure"

A named state of affairs exists if a given condition holds:

     37  "condition-holds-if"

Integration with JavaScript:

     13  "abstract op"
     21  "js:internal-method"
     14  "js:host-defined-op"

Some other spec declares the algorithm, this spec gives steps for it:

     15  "for-external-term"

(You could say that "js:host-defined-op" belongs here too.)

There's one 'abstract' declaration of the algorithm, but different kinds of object can have different definitions for the algorithm:

    154  "multiple-defns"
      3  "multiple-defns,condition-holds-if"

(You could say that "js:internal-method" belongs here too.)

When something occurs in the data model, the UA must perform some steps:

    125  "reaction"
      4  "reaction,define-procedure"

Parsing-related algorithms:

     80  "tokenization-state-behavior"
     24  "insertion-mode-rules"

See below:

      5  "main+subs"

Algorithms that appear in examples, which might or might not be 'serious':

      7  "appear in examples"

Abstract declarations of algorithms:

      4  "declare"

Things that are not actually algorithms, but I felt like identifying:

     33  "define-format"
     46  "define-struct"

I don't know how to categorize it:

     71  "?"

"main+subs" isn't a particular kind of algorithm, but rather a collection of algorithms, where there's one main algorithm that invokes others, but those others use variables defined in the main algorithm without taking them as parameters. (These need to be treated specially when it comes to checking for <var>s being defined.) One way to think of these is as as macros that get inlined at the invocation-points. However, I prefer to think of them as algorithms that are conceptually nested at some point within the main algorithm, and so can 'see' the variables defined at that point. (Presumably they appear after the main algorithm to make the main algorithm easier to read.)


/acknowledgements.html ( diff )
/browsers.html ( diff )
/browsing-the-web.html ( diff )
/canvas.html ( diff )
/common-dom-interfaces.html ( diff )
/common-microsyntaxes.html ( diff )
/comms.html ( diff )
/custom-elements.html ( diff )
/dnd.html ( diff )
/document-lifecycle.html ( diff )
/document-sequences.html ( diff )
/dom.html ( diff )
/dynamic-markup-insertion.html ( diff )
/edits.html ( diff )
/embedded-content-other.html ( diff )
/embedded-content.html ( diff )
/form-control-infrastructure.html ( diff )
/form-elements.html ( diff )
/forms.html ( diff )
/grouping-content.html ( diff )
/iana.html ( diff )
/iframe-embed-object.html ( diff )
/image-maps.html ( diff )
/imagebitmap-and-animations.html ( diff )
/images.html ( diff )
/index.html ( diff )
/indices.html ( diff )
/infrastructure.html ( diff )
/input.html ( diff )
/interaction.html ( diff )
/interactive-elements.html ( diff )
/introduction.html ( diff )
/links.html ( diff )
/media.html ( diff )
/microdata.html ( diff )
/named-characters.html ( diff )
/nav-history-apis.html ( diff )
/obsolete.html ( diff )
/parsing.html ( diff )
/popover.html ( diff )
/references.html ( diff )
/rendering.html ( diff )
/scripting.html ( diff )
/sections.html ( diff )
/semantics-other.html ( diff )
/semantics.html ( diff )
/server-sent-events.html ( diff )
/speculative-loading.html ( diff )
/structured-data.html ( diff )
/syntax.html ( diff )
/system-state.html ( diff )
/tables.html ( diff )
/text-level-semantics.html ( diff )
/timers-and-user-prompts.html ( diff )
/urls-and-fetching.html ( diff )
/web-messaging.html ( diff )
/webappapis.html ( diff )
/webstorage.html ( diff )
/workers.html ( diff )
/worklets.html ( diff )
/xhtml.html ( diff )

@tabatkins
Copy link
Contributor

Out of curiosity, why use a different attribute marker to indicate an algorithm than what Bikeshed uses? There's certainly no, like, interop between the documents, but using identical patterns when possible reduces mental load for spec authors that touch multiple specs.

It looks like a major reason might be that about half the "algorithms" don't scope their variables, per your numbers. Can you expand on that?

@jmdyck
Copy link
Contributor Author

jmdyck commented Jun 24, 2025

Out of curiosity, why use a different attribute marker to indicate an algorithm than what Bikeshed uses?

Using the same marker might have suggested it has the same semantics, which I don't think it does (though there's a lot of overlap). One difference is that the (optional) value of Bikeshed's algorithm attribute is the name of the algorithm.

But certainly, if the HTML editors would like the spec to be more like Bikeshed in this regard, that can be done.

It looks like a major reason might be that about half the "algorithms" don't scope their variables, per your numbers. Can you expand on that?

It's not that they don't scope their variables, it's that they don't have variables to scope. E.g.

Although "a parallel queue" and "the parallel queue" are basically the declaration and use of a parameter, it doesn't use any <var> elements, so there's no need to establish a var-scope. (It might be nice to add a <var> to make that link explicit, in which case scanning for <div without a var-scope attribute will find candidates for such treatment.)

Other examples are where an algorithm's only use of (what you might think of as) a parameter is to say this element or <span>this</span>.

And then there are cases like in 2.6.1 Reflecting content attributes in IDL attributes, where "For a reflected target that is an element element, these are defined as follows:" introduces a <dl> of 4 algorithms (which refer to element). I put a <div var-scope> around the <p> + <dl> in order to capture the declaration of element, so the individual <div algo>s don't need var-scope. Mind you, some of those algorithms declare additional variables, which are presumably scoped to those algorithms, so you could argue that those algorithms should have var-scope. It depends on whether the editors want to support nested var-scopes.

One caveat is that I mostly ignore <var> elements within notes and examples, so there are probably spots where adding var-scope would make sense there. Or, e.g., if a note refers to variables of an algorithm, maybe it should be included in the var-scope of that algorithm (if it's adjacent).

@domenic
Copy link
Member

domenic commented Jun 24, 2025

My reaction is the same as @tabatkins. This is more than I anticipated in #10483 (comment). And I worry that trying to guarantee this level of fine-grained metadata going forward will be a burden on contributors.

Can we just use class="algorithm", without the fine-grained algorithm typing, or the two separate attributes? Related to your last message, I don't think there's extra value in distinguishing algorithms with no variables, versus algorithms with variables. The former is just a special case of the latter, where the number of variables equals zero.

Additional note: dff98e6 does not match our style guide. https://github.com/whatwg/html/blob/main/CONTRIBUTING.md#element-hierarchy

@jmdyck
Copy link
Contributor Author

jmdyck commented Jun 24, 2025

This is more than I anticipated in #10483 (comment).

It's certainly more than you described, and I knew that when I created the PR, but like I said, it's more of a discussion-starter than a solid proposal.

And I worry that trying to guarantee this level of fine-grained metadata going forward will be a burden on contributors.

Yup.

Can we just use class="algorithm", without the fine-grained algorithm typing, or the two separate attributes?

Certainly.

I'll explain more about why this draft has algo="..." attributes.

(1)
One of the problems I had was: how can I be fairly confident that I've found + tagged all the algorithms in the spec? E.g., based on the Infra spec, you'd think that to find all algorithms, all you have to do is search for sentences beginning with "To <dfn". That does find a lot, but only ~15% of the total.

Similarly, the Web IDL spec gives particular wording for defining behavior, but looking for just that wording will miss lots.

So one thing I did was read the spec's Web IDL fragments, to figure out what behavior-algorithms would need to appear elsewhere in the spec. So if the algorithms I had tagged so far didn't cover that, I would need to look further (i.e., look for other phrasings). The granular algo="idl:..." markup is a reflection of that checking.

(2)
I'm interested in parsing the spec's algorithms, and I thought I might need the value of the algo attribute to tell me how I needed to parse the content of the div. I've mostly convinced myself that this isn't the case, but not entirely.

(3)
There's still the question of what exactly should be marked up as an algorithm. So the algo="..." attributes will help stake out the boundary. If you decide that algo="foo" shouldn't be marked as an algorithm, then having the algo values (for now) make that a lot easier for you to say and for me to do.


Related to your last message, I don't think there's extra value in distinguishing algorithms with no variables, versus algorithms with variables. The former is just a special case of the latter, where the number of variables equals zero.

To be clear, I didn't introduce var-scope to distinguish no-var algorithms from some-var algorithms. (I agree that isn't a hugely useful distinction, and it's one you can pretty easily reconstruct anyway.) I introduced var-scope to be able to identify variable scopes that don't coincide with an algorithm. I described one of those (in 2.6.1) in my previous message. There are others.

So sure, we can say that every algorithm is implictly a variable scope, and so drop var-scope from any <div algo> that has it. But that will still leave at least 16 var-scopes (more if something tagged <div algo="..." var-scope> is deemed to not be an algorithm).


Additional note: dff98e6 does not match our style guide. https://github.com/whatwg/html/blob/main/CONTRIBUTING.md#element-hierarchy

Sure, but like I said, that commit's only there to make the main commit cleaner. (That is, in the main commit, every new div tag is on its own line, without disrupting any existing line. That made it easier for me to check that the main commit didn't have any unwanted changes in it.) I assume the commits will get squashed before merging, so if one commit introduces bad style and then another commit fixes it, is that a problem?

Or are you saying that the later commit doesn't fix it? Currently, the PR says

<li>
 <div ...>
 <p>The setter steps are to ...
 with the given value.</p>
 </div>
</li>

According to the style sheet, I guess this should be

<li><div ...><p>The setter steps ...
with the given value.</p></div></li>

because both the li and the div contain a single "block" element, but the status quo spec has zero cases where a div start-tag is preceded (on its line) by anything other than spaces (even when it's the only child of its parent), so I hadn't considered that as a possibility.

Or maybe you'd prefer

<li>
 <div ...><p>The setter steps ...
 with the given value.</p></div>
</li>

but then I have to ask if you'd like a similar treatment for other cases where the new div wraps a single element.

Of course, rather than introducing a div to wrap this algorithm, we could just mark the p as an algorithm. That's fine, it was just easier for me to use a new div uniformly.

@domenic
Copy link
Member

domenic commented Jun 25, 2025

Thanks for your detailed reply! I think I'm aligned on most points. To summarize:

  • We can use the fine-grained attributes during the review phase, even if we eventually end up with just class="algorithm".

  • There are at least some cases that are var-scopes but not algorithms. I'll try to look at those more closely to understand what we should do with them.

  • I was mistaken on dff98e6; in context it makes perfect sense.

Regarding the eventual end state, do you think class="algorithm", plus maybe class="var-scope", is sufficient? Or is there a case to be made for something more detailed? Again my main concern here is making it easy for future maintainers. However, if there are compelling tooling benefits to adding more detailed information, I'm happy to discuss.

Anyway, I'll try to do a more detailed review pass now!

@domenic
Copy link
Member

domenic commented Jun 25, 2025

So, the diff is huge, and not reviewable on GitHub. But reviewing it locally, the following things stood out to me from, like, the first 10%, plus some skipping around.

  • I am sad about how non-uniform our algorithm declaration style is.

  • For single-paragraph "algorithms", the most natural thing for people to do while authoring will be <p class="algorithm"> instead of <div class="algorithm">\n<p>. However, I kind of like the idea of uniformly using <div> everywhere, as it makes refactoring easier, and is easy to spot.

  • In Bikeshed, because the main use of the algorithm markup is variable highlighting, many single-sentence definitions would not necessarily be marked up. For example, "matches about:blank", or "valid URL potentially surrounded by spaces".

    • I kind of like categorizing these as algorithms, since they could be equivalently written as algorithms. If we named the parameter, and expand into a conditional step 1 which returns true, followed by a fallback step 2 which returns false, it's clearly algorithm-ey. And, that's what you would do in the implementation.

    • I do worry a bit that it will be hard for people to remember to mark up "definition-ish" algorithms in the future. Maybe clear Contributing.md documentation can help with this...

    • I found "CORS-same-origin", "CORS-cross-origin", and "internal response" which seem like they fit the criteria of definition-ish algorithm, and aren't marked up. Did you have a specific criteria you used to exclude these?

    • One particular messy instance is the difference between "week number of the last day" (marked up as not an algorithm, but has a <dfn>) and "has 53 weeks" (marked up as an algorithm/var-scope, but doesn't have a <dfn>). These are really just a reflection of the non-uniform messiness of the current spec, so I am not sure if there's anything to be done, besides note them as future possibilities for the infinite cleanup backlog.

  • Going through all the <div var-scope> (with no algo=""):

    • Most of them are used for reflection, where there's a set of distinct algorithms that "close over" one variable. The user activation cases are similar, and the "In the following CSS block" case is at least kind of similar. These seem like reasonable use cases to me, and the need for this is rare enough that I am not worried about contributor burden, as long as we document it in Contributing.md.

    • The use for "a serialization of the bitmap as a file" could probably just be thought of as an algorithm. I think it's basically an algorithm with one step and lots of requirements on how that step behaves.

  • We should not mark up define-struct, unless you think there is a compelling tooling-related reason. (Maybe in the future we should standardize struct definition markup, but I don't even know which of the possibilities I like best.)

  • define-format seems pretty algorithmey. Similar to other definition-algorithms, it could be refactored into an algorithm over an explicit variable.

  • Are there any <var>s outside of var-scope? If not, that could be a nice check to introduce into the build process eventually.

  • It does seem likely that we'll want to keep main+subs in some format, so that any JavaScript variable-highlighting or any sort of build process variable checking can treat them specially. (Are there others like this?)

  • Overall this has me itching for an Ecmarkup-like process for generating algorithm headers. Perhaps one day.

  • (Not a request for more work, just something I noticed that is related.) We've had some complaints before about how it's hard to link to algorithms without definitions.

    • One case I found is "Whenever an element including HTMLOrSVGElement becomes browsing-context connected, the user agent must execute the following steps on element". We've done piecemeal fixes to these sorts of things when requested. In this case it would look changing to "the user agent must execute the following HTMLOrSVGElement browsing-context connection steps on element:".

    • However, there are some cases where this doesn't make sense, like the adjacent "The [cloning steps] for elements that include HTMLOrSVGElement". (Your multiple-dfns cases.)

    • Maybe this whole problem doesn't matter as much these days since scroll-to-text-fragment exists?

    • The introduction of <div class="algorithm"> wrappers may, in the future, allow us to generate synthetic link markers here... although it would probably require adding id=""s as well.

@jmdyck
Copy link
Contributor Author

jmdyck commented Jun 25, 2025

There are at least some cases that are var-scopes but not algorithms. I'll try to look at those more closely to understand what we should do with them.

You looked at cases of <div var-scope>, but note there are also 4 <p var-scope> and 1 <dd var-scope>.

(Looking at the 3rd <p var-scope>, the one under :focus, I think should have instead create a <div algo> around the p + ul for "element has the focus".)

Regarding the eventual end state, do you think class="algorithm", plus maybe class="var-scope", is sufficient? Or is there a case to be made for something more detailed? Again my main concern here is making it easy for future maintainers. However, if there are compelling tooling benefits to adding more detailed information, I'm happy to discuss.

I'm pretty sure that just class="algorithm" and class="var-scope" will suffice to allow some useful analysis/tooling. Maybe once we have experience with that, we'll be in a better position to decide about something more detailed. I don't think I could make a strong case at this point for more detailed markup.

@tabatkins
Copy link
Contributor

I introduced var-scope to be able to identify variable scopes that don't coincide with an algorithm.

This makes sense, and is something I wouldn't mind introducing to Bikeshed as well. (And that name, var-scope, works for me.) Right now anything not in an algo is put in the global scope, which limits the ability of the typo-detection to work (copy-pasting a typo into two spots would cause Bikeshed to assume it's intended).

But in Bikeshed, at least, it would be in addition to the var scoping that happens automatically for algorithms. For ease of comparison, I think it would be a good idea to assume that algorithms automatically scope their variables. (We can worry about to what extent we need to care about marking up a shared variable "inheriting into" nested scopes later, if at all. This is mostly for human convenience, and lightly for linting and styling, so it doesn't necessarily need to be too precise.)

@jmdyck
Copy link
Contributor Author

jmdyck commented Jun 25, 2025

(Responding to just a couple points for now...)

I am sad about how non-uniform our algorithm declaration style is.

Indeed, the variety of syntax is discouraging. I imagine I'll suggest some consistification to make analysis easier, but it probably won't make much of a dent in the non-uniformity.

[...] single-sentence definitions [...]. For example, "matches about:blank", or "valid URL potentially surrounded by spaces".

I kind of like categorizing these as algorithms, since they could be equivalently written as algorithms. [...]

Yeah, that's what I thought too. More generally, they are rich in statically checkable phrases, so it would be a shame not to check them.

I do worry a bit that it will be hard for people to remember to mark up "definition-ish" algorithms in the future. Maybe clear Contributing.md documentation can help with this...

It may be difficult to draw the line between algorithm-y and non-algorithm-y definitions.

I found "CORS-same-origin", "CORS-cross-origin", and "internal response" which seem like they fit the criteria of definition-ish algorithm, and aren't marked up. Did you have a specific criteria you used to exclude these?

I think the specific reason was just that I didn't look for dfn-paragraphs with that form. E.g., I probably would have included CORS-same-origin if it had been phrased as:

"A response is CORS-same-origin if ..."

or

"To determine whether a response is CORS-same-origin, ..."

Background: I didn't want to step through the whole HTML spec, asking for each paragraph in turn whether it should be marked as an algorithm. So instead [among other things] I 'scraped' paragraphs from the spec and tried to identify patterns of phrasing that looked like algorithms, and then (check and) mark-up the paragraphs using those patterns. Which has the failure mode of leaving things out.

Anyway, I'll take another look.

One particular messy instance is the difference between "week number of the last day" (marked up as not an algorithm, but has a <dfn>) and "has 53 weeks" (marked up as an algorithm/var-scope, but doesn't have a <dfn>). These are really just a reflection of the non-uniform messiness of the current spec, so I am not sure if there's anything to be done, besides note them as future possibilities for the infinite cleanup backlog.

The "has 53 weeks" paragraph got my attention because of the <var>s. I'm pretty sure I looked at the "week number of the last day" paragraph and just bailed because it was so weird.


Are there any <var>s outside of var-scope?

Yes, I'm currently counting 84, ignoring examples, notes, and domintros. I think I only bothered creating a var-scope if there were at least two occurrences of the same variable.

Overall this has me itching for an Ecmarkup-like process for generating algorithm headers. Perhaps one day.

Right there with you, and I'd happily work on it, like I did for EcmaScript. That could conceivably be an alternative to this PR, since if you have a distinct enough syntax for algorithm headers, you might not need class="algorithm" in the source. But I'm guessing that's a much bigger project, and shouldn't forestall the benefits of class="algorithm" in the short term, even if it obsoletes it in the long term.

@jmdyck
Copy link
Contributor Author

jmdyck commented Jun 27, 2025

(responding to the rest of Domenic's comment...)

Going through all the <div var-scope> (with no algo=""):
Most of them are used for reflection, where there's a set of distinct algorithms that "close over" one variable. The user activation cases are similar, and the "In the following CSS block" case is at least kind of similar. These seem like reasonable use cases to me, and the need for this is rare enough that I am not worried about contributor burden, as long as we document it in Contributing.md.

Okay, cool.

The use for "a serialization of the bitmap as a file" could probably just be thought of as an algorithm. I think it's basically an algorithm with one step and lots of requirements on how that step behaves.

I did have it marked as an algorithm for a while, but it was just too weird.


We should not mark up define-struct, unless you think there is a compelling tooling-related reason.

There's definitely a tooling-related reason to have struct info handy. E.g., if you want to validate (type-check) set X's Y to Z, you might determine that X is a struct, in which case you need to confirm that that kind of struct has a Y member, and that its type is compatible with Z's.

However, I'm not sure that <div algo="define-struct"> (or similar) is a feasible way to collect the necessary info, so I'm okay with dropping that for this PR.

(Maybe in the future we should standardize struct definition markup, but I don't even know which of the possibilities I like best.)

Standardizing either the markup or the introductory wording would be good.


It does seem likely that we'll want to keep main+subs in some format, so that any JavaScript variable-highlighting or any sort of build process variable checking can treat them specially.

It occurs to me that <div algo="main+subs" var-scope> could maybe become just <div var-scope>, depending on how the <var>-handling code is written.

(Are there others like this?)

There might be other cases where main+subs could have applied. I don't think I was very thorough about how I found them.


We've had some complaints before about how it's hard to link to algorithms without definitions.

I understand what you mean, but I don't think I have much to add to your points.

@jmdyck
Copy link
Contributor Author

jmdyck commented Aug 19, 2025

I have renamed the previous commits of this PR (modified after rebasing) as "round1". I'm now adding 6 commits of "round2"...


Commit 1 of round2 corrects some round1 markup.


In #11392 (comment), @domenic said:

Are there any <var>s outside of var-scope? If not, that could be a nice check to introduce into the build process eventually.

At the time, there were lots, but commits 2+3+4 of round2 clean that up:

2: In 13 cases, extend the scope of a <div algo> to include a subsequent note/paragraph that refers to a <var> in the algo.
3: Add the var-scope attribute to ~30 more elements.
4: Add the unscoped attribute to ~40 <var> elements that don't really belong to a scope.

The result of these changes is that every <var> element either:

  • has the unscoped attribute, or
  • is within an element with the var-scope attribute, or
  • is within a <dl class="domintro">.

(Or, eventually, "is within a <div algorithm>")

So, in future, the build process can complain if it finds a <var> for which none of those is true.


I found "CORS-same-origin", "CORS-cross-origin", and "internal response" which seem like they fit the criteria of definition-ish algorithm, and aren't marked up. Did you have a specific criteria you used to exclude these?

[Rather than "internal response", which is defined in Fetch, I think you mean "unsafe response".]

I went through the spec with a medium-tooth comb, and found lots more algorithms, including the three you listed.

Commit 5 of round2 inserts some linebreaks to make the next commit cleaner, and commit 6 adds 470 more <div algo>.

Many of these are questionable. I'm not sure where to draw the line.

@domenic
Copy link
Member

domenic commented Aug 28, 2025

Alright, so, how do we move toward landing this? I imagine floating this giant patch is not that fun.

Here is a proposal, but feel free to give your thoughts:

@jmdyck
Copy link
Contributor Author

jmdyck commented Aug 31, 2025

Dealing with some easier things first...

Create a patch for https://github.com/whatwg/html-build which does minimal rewriting of algorithm="" to data-algorithm="" and var-scope="" to data-var-scope="", to preserve validity of the output document. (I can do this if you'd prefer not to touch the Rust tooling.)

I don't see where in the code to do this. Does it need a whole new Processor? If so, yeah, you'll probably need to do it.

Send a PR to copy https://github.com/speced/bikeshed/blob/main/bikeshed/stylescript/var-click-highlighting.css into https://github.com/whatwg/whatwg.org/blob/main/resources.whatwg.org/standard.css .

Done: whatwg/whatwg.org#470

Add a JS file to this patch that is a copy of https://github.com/speced/bikeshed/blob/main/bikeshed/stylescript/var-click-highlighting.js .

I've pushed a commit that adds that. It also adds a copy of the CSS file, just until the above PR is merged+published. For now, it keys off the attributes used in the current PR ("algo" and "var-scope"), though those will change. Basically, I just wanted to be able to see var-highlighting in the output of my local html-build. Not sure what will happen in the CI build. [Later: It doesn't work in the CI-built pages.]

More to say tomorrow (I hope).

domenic pushed a commit to whatwg/whatwg.org that referenced this pull request Sep 2, 2025
@domenic
Copy link
Member

domenic commented Sep 10, 2025

I've almost got the preprocessor working; will try to post it soon. Some things to note:

  • We should change unscoped to ignore, to match Bikeshed. (I like unscoped better but consistency seems important.)

  • I'm implementing the check that all vars are scoped or marked up. It works nicely.

  • I haven't yet implemented the check (which Bikeshed has) that all non-ignored variables appear at least twice in their algorithm/var-scope. This can be quite handy for catching typos. Do you have an impression of how far off we might be from that?

@domenic
Copy link
Member

domenic commented Sep 10, 2025

Ah, also:

  • We should not use both algorithm="" and var-scope="" on the same attribute. I'm having the preprocessor error if you do that.

domenic added a commit to whatwg/html-build that referenced this pull request Sep 10, 2025
This supports the work in whatwg/html#11392 with a new processor found in variables.rs, documented there.

This required updating the parser to store line numbers for each element, which changed a lot of test call sites.
domenic added a commit to whatwg/html-build that referenced this pull request Sep 10, 2025
This supports the work in whatwg/html#11392 with a new processor found in variables.rs, documented there.

This required updating the parser to store line numbers for each element, which changed a lot of test call sites.
domenic added a commit to whatwg/html-build that referenced this pull request Sep 10, 2025
This supports the work in whatwg/html#11392 with a new processor found in variables.rs, documented there.

This required updating the parser to store line numbers for each element, which changed a lot of test call sites.
@jmdyck
Copy link
Contributor Author

jmdyck commented Sep 11, 2025

We should change unscoped to ignore, to match Bikeshed. (I like unscoped better but consistency seems important.)

When I looked at the bikeshed doc, I got the impression that they had different semantics: I intended unscoped to mean "this <var> is not in a variable scope", whereas ignore seems to mean "this <var> is in a variable scope, but should be ignored when complaining about <var>s that only appear once in that scope".

I haven't yet implemented the check (which Bikeshed has) that all non-ignored variables appear at least twice in their algorithm/var-scope. This can be quite handy for catching typos. Do you have an impression of how far off we might be from that?

It would currently raise about 150 warnings. Some are false positives (e.g., unused parameters), and some appear to be true positives. Non-algorithmic var-scopes should maybe be exempt from the check.

We should not use both algorithm="" and var-scope="" on the same attribute. I'm having the preprocessor error if you do that.

Yeah, I think we agreed on that a while ago. Anyway, I've been assuming for a while that (eventually) algorithm would imply var-scope. I just haven't done the changes yet.

@domenic
Copy link
Member

domenic commented Sep 11, 2025

When I looked at the bikeshed doc, I got the impression that they had different semantics: I intended unscoped to mean "this <var> is not in a variable scope", whereas ignore seems to mean "this <var> is in a variable scope, but should be ignored when complaining about <var>s that only appear once in that scope".

Although I agree theoretically this could be a distinction, it turns out Bikeshed has a global scope, so there isn't really one:

LINE 187:6: The var 'bar' (in global scope) is only used once.
If this is not a typo, please add an ignore='' attribute to the <var>.

It would currently raise about 150 warnings. Some are false positives (e.g., unused parameters), and some appear to be true positives. Non-algorithmic var-scopes should maybe be exempt from the check.

That's not bad! Some good follow-up work then.

@jmdyck
Copy link
Contributor Author

jmdyck commented Sep 23, 2025

More commits...

  • Drop <div algo="define-struct">

And then 2 commits re nested variable-scopes:

  • Change all 5 <div algo="main+subs"> to <div var-scope>

  • In the remaining cases where a <div algo> contains another <div algo>, dissolve the inner one, as this will probably give a better var-highlighting experience to the reader.

After these, the only cases of scope-nesting is where a <div var-scope> contains 2 or more <div algo>.

@jmdyck
Copy link
Contributor Author

jmdyck commented Sep 23, 2025

  • Eliminate the 7 <div algo="declare">.

For one, the "declare" was just a mistake, so I changed it to "?".

For the others, they aren't defining an algorithm, just declaring it, to be defined (possibly multiple times) elsewhere. This is certainly algorithm-adjacent, and might be deserving of some markup eventually, but I'm guessing not yet. So I changed them to <div var-scope><!-- algo-declaration -->

@jmdyck jmdyck marked this pull request as ready for review September 25, 2025 00:41
@jmdyck
Copy link
Contributor Author

jmdyck commented Sep 25, 2025

Okay, I think that's everything.

* Various editorial updates to CONTRIBUTING
* Fix var-click-highlighting to use the new attribute names
* Use defer on var-click-highlighting script
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work. I pushed a few minor tweaks, mostly just to the CONTRIBUTING guidelines. Please holler if you think any of them were incorrect.

I've tested this with whatwg/html-build#306 and I get variable highlighting!!

I'll work on the gymnastics for landing this ASAP, maybe today.

domenic added a commit to whatwg/html-build that referenced this pull request Sep 25, 2025
This is easier to maintain, as per work on whatwg/html#11392 we'll be adding a new JS file.
domenic added a commit to whatwg/html-build that referenced this pull request Sep 25, 2025
This is easier to maintain, as per work on whatwg/html#11392 we'll be adding a new JS file.
domenic added a commit to whatwg/html-build that referenced this pull request Sep 25, 2025
This supports the work in whatwg/html#11392 with a new processor found in variables.rs, documented there.

This also updates the parser to store the line numbers for each element, for use in error messages.
domenic added a commit to whatwg/html-build that referenced this pull request Sep 25, 2025
This supports the work in whatwg/html#11392 with a new processor found in variables.rs, documented there.

This also updates the parser to store the line numbers for each element, for use in error messages.

For now this contains commented-out code to error on `<var>`s outside of `algorithm` and `var-scope` scopes, since CI will prevent us from merging a html-build change that cannot build the current main branch of whatwg/html. After the HTML PR is merged, we can uncomment that code.
domenic added a commit to whatwg/html-build that referenced this pull request Sep 25, 2025
This supports the work in whatwg/html#11392 with a new processor found in variables.rs, documented there.

This also updates the parser to store the line numbers for each element, for use in error messages.

For now this contains commented-out code to error on `<var>`s outside of `algorithm` and `var-scope` scopes, since CI will prevent us from merging a html-build change that cannot build the current main branch of whatwg/html. After the HTML PR is merged, we can uncomment that code.
@domenic domenic merged commit aacb0df into whatwg:main Sep 25, 2025
2 of 3 checks passed
domenic added a commit to whatwg/html-build that referenced this pull request Sep 25, 2025
A follow-up to adb2e6f, now that whatwg/html#11392 is merged.
@domenic
Copy link
Member

domenic commented Sep 25, 2025

I'm so glad we could land this! Thanks for all your hard work here; this will definitely improve the user experience of the spec going forward. Your patience for these sorts of large-scale refactors always impresses me.

domenic added a commit to whatwg/html-build that referenced this pull request Sep 25, 2025
A follow-up to adb2e6f, now that whatwg/html#11392 is merged.
domenic added a commit to whatwg/html-build that referenced this pull request Sep 25, 2025
A follow-up to adb2e6f, now that whatwg/html#11392 is merged.
@jmdyck
Copy link
Contributor Author

jmdyck commented Sep 25, 2025

I pushed a few minor tweaks

They look okay to me.

I was so relieved to be done that I forgot some stuff, like taking care of the "INTERIM" comment in the JS, and doing one last rebase-to-main.

Thanks for all your hard work here;

And thanks to you for your ongoing interest.

Your patience for these sorts of large-scale refactors always impresses me.

Patience is my superpower, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants