Skip to content

Add pre-computed headwords with morph tokens to Entry model#2202

Draft
myieye wants to merge 44 commits intodevelopfrom
claude/add-lexeme-headwords-TowRX
Draft

Add pre-computed headwords with morph tokens to Entry model#2202
myieye wants to merge 44 commits intodevelopfrom
claude/add-lexeme-headwords-TowRX

Conversation

@myieye
Copy link
Collaborator

@myieye myieye commented Mar 10, 2026

Summary

This PR adds a pre-computed Headword property to the Entry model that includes morphological tokens (leading/trailing affixes) applied to lexeme forms. The headword is computed during entry loading and made available across all writing systems, enabling better search and display functionality.

Key Changes

  • Entry Model Enhancement: Added Headword property to Entry as a MultiString containing pre-computed headwords for all writing systems with morph tokens applied

    • CitationForm takes priority when present
    • Otherwise: LeadingToken + LexemeForm + TrailingToken
    • Computed for all writing systems present in CitationForm or LexemeForm
  • Headword Computation:

    • Added ComputeHeadwords() method in EntryQueryHelpers for in-memory computation
    • Added HeadwordWithTokens() and HeadwordSearchValue() expression methods for SQL translation
    • Integrated computation into entry finalization pipeline via QueryHelpers.Finalize()
  • Search Service Updates:

    • Modified EntrySearchService.Filter() to accept WritingSystemId parameter
    • Updated ToEntrySearchRecord() to include headwords for all vernacular writing systems (space-separated)
    • Improved ranking to use per-WS headword instead of generic search record headword
  • Data Bridge Integration:

    • Updated FwDataMiniLcmApi.FromLexEntry() to compute headwords using LCM morph type data
    • Added ComputeHeadword() helper that mirrors the CRDT computation logic
  • API Updates:

    • Renamed Entry.Headword() method to HeadwordText() for clarity (returns first non-empty headword for logging/display)
    • Updated all call sites throughout codebase
    • Added Headword to CRDT model configuration as ignored property (not persisted)
  • Frontend Integration:

    • Updated TypeScript types to include headword property in IEntry
    • Modified writing-system-service to prefer pre-computed headword with fallback to raw forms
    • Updated demo data and test fixtures
  • Testing:

    • Added comprehensive HeadwordSearchValueTests covering token matching, citation form priority, and multi-WS scenarios

Notable Implementation Details

  • Headword computation respects all writing systems present in the data, not just current vernacular WSs, ensuring no data loss for non-current or future writing systems
  • The Headword property is computed in-memory during entry loading and not persisted to the database
  • TODO comments indicate future work to integrate actual MorphTypeData from CRDT once it's implemented as a CRDT entity
  • Search filtering now uses per-writing-system headwords for more accurate results

https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f

rmunn added 21 commits March 5, 2026 14:38
CRDT CRUD operations for MorphTypeData are not yet implemented, so this
will fail to sync and throw a NotImplementedException if you actually
try to sync.
In liblcm, the .Prefix and .Postfix properties can be (and often are)
null, so the corresponding LeadingToken and TrailingToken properties in
MorphTypeData need to be nullable as well. The UI will have to do ?? ""
in appropriate places as it displays the leading/trailing tokens.
Created the change but forgot to include it in what the method returns.
Test was failing because mock didn't have new API registered so was
returning null.
This one used yet another snapshot file that I hadn't updated yet
This time instead of an empty list of morph types, we'll use the actual
morph types found in the .fwdata file, which is what the test will
actually see.
Needed both the Snapshot *and* the sqlite3 database to be updated to
expect the new MorphTypeData table.
Check in two places: the API, which is the most likely place to catch
duplicates, but also in the Change object itself, in case two people
somehow end up submitting the same change twice (e.g., one person was
offline for a period of time).
Now that the API prevents creating duplicate MorphTypes, we can add a
database constraint to enforce that at the DB level.
The backslash character is not a path separator on Linux, so this test
was failing to find the snapshot file(s).
@github-actions github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 10, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 16c4b3be-bee8-42cd-804d-7fdd9a7f1b77

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/add-lexeme-headwords-TowRX
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from 13a4e0e to 7723717 Compare March 10, 2026 14:37
@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

UI unit Tests

  1 files  ±0   54 suites  ±0   25s ⏱️ ±0s
140 tests ±0  140 ✅ ±0  0 💤 ±0  0 ❌ ±0 
207 runs  ±0  207 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit a5ea68d. ± Comparison against base commit 3f62d45.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 10, 2026

C# Unit Tests

165 tests  +3   165 ✅ +3   18s ⏱️ ±0s
 23 suites ±0     0 💤 ±0 
  1 files   ±0     0 ❌ ±0 

Results for commit a5ea68d. ± Comparison against base commit 3f62d45.

♻️ This comment has been updated with latest results.

@argos-ci
Copy link

argos-ci bot commented Mar 10, 2026

The latest updates on your projects. Learn more about Argos notifications ↗︎

Awaiting the start of a new Argos build…

rmunn added 3 commits March 11, 2026 08:42
Since EF Core doesn't support the ON CONFLICT IGNORE syntax from Sqlite,
but we need that syntax for the constraint here, we must hand-write the
SQL rather than let EF Core generate it for us. Thankfully we can let EF
Core generate the migration, look at the resulting table in Sqlite, and
copy that CREATE TABLE syntax with just minor changes, so the actual
hand-writing is minimal since most of it is written for us already.
UpdateMorphTypeData(before, after) could have potentially applied some
changes that UpdateMorphTypeData(id, updateDocument) would have rejected.
This keeps their behaviors consistent by rejecting any change that would
have tried to change the MorphType enum.
rmunn and others added 20 commits March 11, 2026 09:18
Possible race condition might exist between two processes creating
MorphTypeData entries with the same MorphType value; vanishingly rare
but theoretically possible. Because the Sqlite table has ON CONFLICT
IGNORE, the second Create call will "lose" and there will not be a
database entry with that Id. But there *will* be a database entry with
the correct MorphType value, so that one is guaranteed to succeed and
return an actual record.
Now the MorphTypeData created will be double-checked to ensure it's all
there as expected, and it will also create a more realistic 19
MorphTypeData entries instead of just one.
With ten MorphTypeData objects being synced as well, the odds of failure
go up just a little bit more, so that with the deterministic seed we've
chosen, 20 attempts are not quite enough to succeed — but it will
succeed on attempt number 23.
This rejects not just the "replace" operation, but also "add" and
"remove" — because the MorphType field should never, ever be changed.
There's no benefit to having the 2025-09-31 snapshot have MorphTypeData
and the 2025-09-30 snapshot not have any.
CrdtMiniLcmApi throws an InvalidOperationException if the update object
touches the MorphType field. FwDataMiniLcmApi should do the same; it's
already possible for the MorphTypeDataSync.Sync code to throw that
exception, but there's a potential code path that, although it would not
be hit in practice, could still throw that exception in theory. Let's
plug that hole even though it'll likely never be hit.
ResumableImportApi wasn't resuming morph type imports, so the resumable
test was having to start over from the beginning on morph types if the
random-chance exception triggered during the process. Fixed that, and
then discovered that it took 21 attempts to complete the import (so the
test stopping after 20 attempts was just one attempt short of success).
Bumped the number of attempts up to 30 to provide some headroom.
Instead of modifying historical snapshots to add the AllMorphTypeData
property, we need to create a new verified snapshot file with the new
data model, leaving the old ones in place so we can continue to verify
that older data models don't cause regressions.
Instead of passing all the object's properties as separate params to
CreateMorphTypeDataChange, it's much simpler to just pass in the entire
object, similarly to how it's done for Entry and other large objects.
Turns out we don't need ON CONFLICT IGNORE since the wrappers we have
built around EF Core will handle it for us: any object that gets created
in an already-deleted state (as we do for duplicate morph types) will
never be sent to the DB in the first place. So it's safe to use EF
Core's API for the migration after all, keeping it consistent with how
the other migrations are written.
Editing typo that caused far too much head-scratching
Now we get to iterate over the actual enum values, rather than their int
equivalents which have to be converted. Which means we can also use the
enum names in test strings and so on. Handy.
If an icon is not an immediate child, then its own parent is resonsible for padding. This was making the theme picker ugly.
Co-authored-by: Tim Haasdyk <tim_haasdyk@sil.org>
* Update to latest shadcn themes

* Fix broken theme values
The contrast of the current values is unacceptable and the themes demo page is using these new values

* Improve theme-picker trigger contrast on home page

* Adapt ListItem to new theme colors

* Fix: dark outline button border color

* Adapt ghost and shell inputs to new shadcn theme

* Use button styling for sort badge-button

* Redesign ListItem states
Now it uses "focus-visible:ring-[3px] focus-visible:ring-ring/50" like everything else

* Fix broken outlines in dark mode
This was only affecting ListItem, which isn't using an outline anymore. So, the issue doesn't currently affect us, but it was VERY confusing.

* Improve writing system contrast

* Fix Add sense button outline cropped

* Improve primary color contrast

* Add story for demoing theme colors
@myieye myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from a5ea68d to 4e2d25d Compare March 13, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related 📙 Platform.Bible

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants