Skip to content

Tighten symbol types throughout the codebase#505

Merged
LPTK merged 58 commits into
hkust-taco:hkmc2from
LPTK:tighten-symbol-types
May 31, 2026
Merged

Tighten symbol types throughout the codebase#505
LPTK merged 58 commits into
hkust-taco:hkmc2from
LPTK:tighten-symbol-types

Conversation

@LPTK
Copy link
Copy Markdown
Contributor

@LPTK LPTK commented May 29, 2026

Well... this took me way longer than I would have thought.

We've had a bad habit of just using Symbol (or its infamous alias Local) everywhere, which led to nobody really knowing whta kind of symbol was really valid where, which in turn led to a proliferation of hacks, workarounds, outright violations, and needlessly defensive programming.

This PR is the first step towards pulling us out of this quagmire. I managed to tighten the symbol types of most things, although some "looser" types are still floating around in some parts, such as ValueSymbol. Ideally, we'd remove most uses of this type.

The lifter in particular caused me immense amounts of pain, as I was trying to figure out just what kind of symbol was allowed where and how everything somehow manages to magically fall back into place after traversing all these crazy mappings. I think there is a large potential for simplification here; I tried to achieve it for a while, but failed miserably. So, for now it uses quite loose type and a hack which is real code smell:

      // Locals introduced by this object
      /* // -- This is the old definition: --
      val fromThisObj = node.localsWithoutBms
        .map: s =>
          s -> s.asLocalPath
      */ // -- The new definition now needs a hack to work, which we should remove: --
      val fromThisObj: Map[ScopedOrInnerSymbol, LocalPath] = node.localsWithoutBms
        .flatMap: s =>
          s match
            case s: BlockMemberSymbol =>
              // * This use of `s.asPrincipal` is incorrect – we can't just assume this is the correct disambiguation!
              S(s -> LocalPath.BmsRef(s, s.asPrincipal.getOrElse:
                lastWords(s"Cannot resolve overloaded member symbol ${s.nme}: no principal disambiguation found")
              ))
              // N  // * can't simply do this, as it makes `Debugging.mls` and `Token.mls` fail to compile
            case s: LocalPathSymbol => S(s -> s.asLocalPath)
            case s: InnerSymbol => S(s -> LocalPath.ThisPath(s))
        .toMap

There is still something funny going on, which causes everything to comes crashing down if I try to improve this part. Hopefully @CAG2Mark can help with this (in a follow-up PR).

Oh, and I probably wasted much more time than I saved by trying to use Codex and Copilot to help with the refactoring. I think this is a good example of a task at which they are completely useless with: redesigning some complicated mechanisms that sit outside of their usual training data. Their "fixes" were always worthless workarounds that only piled more technical debt on the existing mess!

LPTK added 30 commits May 19, 2026 18:26
# Conflicts:
#	hkmc2/shared/src/main/scala/hkmc2/codegen/BlockSimplifier.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/Lowering.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/ReflectionInstrumenter.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/StackSafeTransform.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/UsedVarAnalyzer.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/js/JSBuilder.scala
#	hkmc2/shared/src/test/mlscript/block-staging/Nested.mls
#	hkmc2/shared/src/test/mlscript/codegen/BlockPrinter.mls
#	hkmc2/shared/src/test/mlscript/codegen/FirstClassFunctionTransform.mls
#	hkmc2/shared/src/test/mlscript/codegen/Hygiene.mls
#	hkmc2/shared/src/test/mlscript/codegen/PlainClasses.mls
#	hkmc2/shared/src/test/mlscript/deforest/relaxedPrograms.mls
#	hkmc2/shared/src/test/mlscript/lifter/PrivateMutableFields.mls
# Conflicts:
#	hkmc2/shared/src/main/scala/hkmc2/codegen/HandlerLowering.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/Lifter.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/Lowering.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/ReflectionInstrumenter.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/StackSafeTransform.scala
LPTK added 9 commits May 29, 2026 15:03
# Conflicts:
#	hkmc2/shared/src/main/scala/hkmc2/codegen/Block.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/cpp/CodeGen.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/llir/Analysis.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/llir/Builder.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/llir/Interp.scala
#	hkmc2/shared/src/main/scala/hkmc2/codegen/llir/Llir.scala
@LPTK LPTK requested review from CAG2Mark, Derppening and Copilot May 29, 2026 11:03
@LPTK LPTK marked this pull request as ready for review May 29, 2026 11:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Derppening
Copy link
Copy Markdown
Contributor

Derppening commented May 30, 2026

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

If Copilot won't do it then I'll ask Claude to do it then! (I'll separately go through the changes manually - Just treat this as what Copilot was supposed to do). I also explicitly told Claude to ignore the LLIR changes - It's just a simple removal so I don't think there's much to review there.

Code review

Reviewed locally on LPTK/tighten-symbol-types, focusing on the symbol-type tightening (I skipped the LLIR/C++ backend removal). Overall this is a solid, well-motivated change — replacing the type Local = Symbol catch-all with precise unions so each IR position only admits the symbols that can legitimately appear there. A few things below; one I'd treat as a blocker.

Correctness

1. freeVars is non-exhaustive for an Assign with a TermSymbol LHS — latent MatchError. (Block.scala)

Assign.lhs: AssignableSymbol = LocalVarSymbol | TermSymbol | NoSymbol. definedVars handles all three:

case Assign(_: NoSymbol, r, rst)       => rst.definedVars
case Assign(l: TermSymbol, r, rst)     => rst.definedVars + l   // ← treated as reachable
case Assign(l: LocalVarSymbol, r, rst) => rst.definedVars + l

but freeVars only covers two:

case Assign(_: NoSymbol, rhs, rest)     => rhs.freeVars ++ rest.freeVars
case Assign(lhs: FreeSymbol, rhs, rest) => Set.single(lhs) ++ ...

FreeSymbol = SimpleSymbol | BlockMemberSymbol | LabelSymbol does not include TermSymbol, and there's no catch-all, so an Assign(termSym, …) falls through to a MatchError. defineSymbol does construct exactly that shape (case sym: TermSymbol => sym.owner match … case N => Assign(sym, rhs, rest)). The pre-PR code added every LHS to freeVars unconditionally, and ValDefn.freeVars still adds its ownerless sym, so dropping the ownerless-TermSymbol case here is a semantic change too, not just a partiality.

Please either add an explicit Assign(l: TermSymbol, …) case to freeVars (mirroring definedVars), or — if Assign(termSym, …) genuinely can't reach freeVars — drop the TermSymbol case from definedVars and document why. As written the two are inconsistent.

2. Acknowledged Lifter hack. In pathsFromThisObj, the new BlockMemberSymbol branch uses s.asPrincipal with a self-flagged comment that the disambiguation is "incorrect", kept only because N breaks Debugging.mls/Token.mls. Fine to land as flagged debt, but it'd be better tracked by an issue than an inline comment, since it's a known-incorrect disambiguation that happens to pass.

Other correctness-sensitive bits look right:

  • Lifter capturePath changing from captureSym.asSimpleRef to Select(Value.This(sym), captureSym.id)(S(captureSym)) is a genuine fix — the capture is a field, so it must be selected on this.
  • Splitting addExtraSyms(…, define: Bool) into addExtraSyms (Scoped local) + initCaptureField (AssignField on owner) is correct and removes a boolean-blind parameter.
  • quoteSplit threading splitTmps: Map[SplitSymbol, TempSymbol] correctly works around SplitSymbol no longer being assignable.
  • applyAssignLhs / applySimpleSymbol / applyImportSymbol and the SymbolRefresher matches are exhaustive over their unions, so the removed lastWords catch-alls are justified.

Code quality / conventions

3. Duplicated type aliases across packages. SimpleSymbol is defined twice, identically — in semantics.Symbol.scala and codegen.Block.scala. The Assign-LHS union is defined twice under different names: semantics.Assignable vs codegen.AssignableSymbol (both LocalVarSymbol | TermSymbol | NoSymbol), and both names are actively used across Block/Lowering/BlockTransformer. Please consolidate to one SimpleSymbol and one name for the assignable union, defined once (in semantics) and imported — the duplicates invite drift.

4. Snake_case leak. SymbolRefresher.scala renames newSnew_s, the only snake_case identifier in the touched code. Suggest keeping it newS.

5. Minor. assignSymbol (Lowering.scala) duplicates the MutVal/LetBind→AssignField logic across the TermSymbol and BlockMemberSymboltsym branches; could be factored, though it reads fine as-is.

Behavioral changes worth a second look

  • Instances no longer members + Ins lowering removed: using/instance defs now get owner = N and are no longer lowered via ValDefn. Covered by NestedUsingInstances.mls and the softAssert(defn.owner.isEmpty) in Resolver — looks intentional, just flagging that a semantics change rides along here.
  • ref to an owned TermSymbol now always selects on this (old privateFieldSelfSelection removed), which is the source of the golden churn in codegen/*.mls, Primes.mls, etc. — worth a careful pass to confirm none regressed.
  • mut val x = 1 at top level is now an error ("Mutable value definitions must have an owner"). Tested; presumably intended.

Tests

Good coverage — BadAssignments.mls exercises the fun/val/mut-val assignment errors with the new messages, and NestedUsingInstances.mls covers the instance reshaping including a :lift :effectHandlers case.

Verdict

Strong, coherent tightening. I'd block on #1 (the freeVars/definedVars TermSymbol asymmetry — a real partiality contradicting the definedVars case) and fold the duplicate aliases (#3) since they're trivially unifiable. The Lifter asPrincipal hack (#2) is fine to ship as flagged debt with a tracking issue.

@LPTK
Copy link
Copy Markdown
Contributor Author

LPTK commented May 30, 2026

Impressive review by the slop-machine!

Though, of course, it missed the forest for the tree, it made me realize TermSymbols have nothing to do in Assign's lhs.

It also got something plain wrong: Ins definitions are still lowered to ValDefn, though through a different path.

@LPTK
Copy link
Copy Markdown
Contributor Author

LPTK commented May 30, 2026

PS: Feels funny to be complimented on "strong, coherent tightening" by the weak, incoherent code generator.

@CAG2Mark
Copy link
Copy Markdown
Contributor

Oh boy...
image

@LPTK
Copy link
Copy Markdown
Contributor Author

LPTK commented May 30, 2026

@CAG2Mark Nothing too serious actually. I had a branch with much more extensive changes, but couldn't go the last mile, so I just backtracked.

Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/Block.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/Lifter.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/Lifter.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/Lifter.scala
Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/StackSafeTransform.scala Outdated
@ychenfo ychenfo mentioned this pull request May 30, 2026
5 tasks
@LPTK LPTK merged commit d775edc into hkust-taco:hkmc2 May 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants