Add pre-computed headwords with morph tokens to Entry model by myieye · Pull Request #2202 · sillsdev/languageforge-lexbox

myieye · 2026-03-10T14:33:31Z

Summary

This PR adds a pre-computed Headword property to the Entry model that includes morphological tokens (leading/trailing affixes) applied to lexeme forms. The headword is computed during entry loading and made available across all writing systems, enabling better search and display functionality.

Key Changes

Entry Model Enhancement: Added Headword property to Entry as a MultiString containing pre-computed headwords for all writing systems with morph tokens applied
- CitationForm takes priority when present
- Otherwise: LeadingToken + LexemeForm + TrailingToken
- Computed for all writing systems present in CitationForm or LexemeForm
Headword Computation:
- Added ComputeHeadwords() method in EntryQueryHelpers for in-memory computation
- Added HeadwordWithTokens() and HeadwordSearchValue() expression methods for SQL translation
- Integrated computation into entry finalization pipeline via QueryHelpers.Finalize()
Search Service Updates:
- Modified EntrySearchService.Filter() to accept WritingSystemId parameter
- Updated ToEntrySearchRecord() to include headwords for all vernacular writing systems (space-separated)
- Improved ranking to use per-WS headword instead of generic search record headword
Data Bridge Integration:
- Updated FwDataMiniLcmApi.FromLexEntry() to compute headwords using LCM morph type data
- Added ComputeHeadword() helper that mirrors the CRDT computation logic
API Updates:
- Renamed Entry.Headword() method to HeadwordText() for clarity (returns first non-empty headword for logging/display)
- Updated all call sites throughout codebase
- Added Headword to CRDT model configuration as ignored property (not persisted)
Frontend Integration:
- Updated TypeScript types to include headword property in IEntry
- Modified writing-system-service to prefer pre-computed headword with fallback to raw forms
- Updated demo data and test fixtures
Testing:
- Added comprehensive HeadwordSearchValueTests covering token matching, citation form priority, and multi-WS scenarios

Notable Implementation Details

Headword computation respects all writing systems present in the data, not just current vernacular WSs, ensuring no data loss for non-current or future writing systems
The Headword property is computed in-memory during entry loading and not persisted to the database
TODO comments indicate future work to integrate actual MorphTypeData from CRDT once it's implemented as a CRDT entity
Search filtering now uses per-writing-system headwords for more accurate results

https://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f

CRDT CRUD operations for MorphTypeData are not yet implemented, so this will fail to sync and throw a NotImplementedException if you actually try to sync.

In liblcm, the .Prefix and .Postfix properties can be (and often are) null, so the corresponding LeadingToken and TrailingToken properties in MorphTypeData need to be nullable as well. The UI will have to do ?? "" in appropriate places as it displays the leading/trailing tokens.

Created the change but forgot to include it in what the method returns.

Test was failing because mock didn't have new API registered so was returning null.

This one used yet another snapshot file that I hadn't updated yet

This time instead of an empty list of morph types, we'll use the actual morph types found in the .fwdata file, which is what the test will actually see.

Needed both the Snapshot *and* the sqlite3 database to be updated to expect the new MorphTypeData table.

Check in two places: the API, which is the most likely place to catch duplicates, but also in the Change object itself, in case two people somehow end up submitting the same change twice (e.g., one person was offline for a period of time).

Now that the API prevents creating duplicate MorphTypes, we can add a database constraint to enforce that at the DB level.

The backslash character is not a path separator on Linux, so this test was failing to find the snapshot file(s).

coderabbitai · 2026-03-10T14:33:52Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 16c4b3be-bee8-42cd-804d-7fdd9a7f1b77

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch claude/add-lexeme-headwords-TowRX

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-10T14:38:05Z

UI unit Tests

1 files ±0 54 suites ±0 25s ⏱️ ±0s
140 tests ±0 140 ✅ ±0 0 💤 ±0 0 ❌ ±0
207 runs ±0 207 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit a5ea68d. ± Comparison against base commit 3f62d45.

♻️ This comment has been updated with latest results.

github-actions · 2026-03-10T14:40:24Z

C# Unit Tests

165 tests +3 165 ✅ +3 18s ⏱️ ±0s
23 suites ±0 0 💤 ±0
1 files ±0 0 ❌ ±0

Results for commit a5ea68d. ± Comparison against base commit 3f62d45.

♻️ This comment has been updated with latest results.

argos-ci · 2026-03-10T14:40:38Z

The latest updates on your projects. Learn more about Argos notifications ↗︎

Awaiting the start of a new Argos build…

Since EF Core doesn't support the ON CONFLICT IGNORE syntax from Sqlite, but we need that syntax for the constraint here, we must hand-write the SQL rather than let EF Core generate it for us. Thankfully we can let EF Core generate the migration, look at the resulting table in Sqlite, and copy that CREATE TABLE syntax with just minor changes, so the actual hand-writing is minimal since most of it is written for us already.

UpdateMorphTypeData(before, after) could have potentially applied some changes that UpdateMorphTypeData(id, updateDocument) would have rejected. This keeps their behaviors consistent by rejecting any change that would have tried to change the MorphType enum.

Possible race condition might exist between two processes creating MorphTypeData entries with the same MorphType value; vanishingly rare but theoretically possible. Because the Sqlite table has ON CONFLICT IGNORE, the second Create call will "lose" and there will not be a database entry with that Id. But there *will* be a database entry with the correct MorphType value, so that one is guaranteed to succeed and return an actual record.

Now the MorphTypeData created will be double-checked to ensure it's all there as expected, and it will also create a more realistic 19 MorphTypeData entries instead of just one.

With ten MorphTypeData objects being synced as well, the odds of failure go up just a little bit more, so that with the deterministic seed we've chosen, 20 attempts are not quite enough to succeed — but it will succeed on attempt number 23.

This rejects not just the "replace" operation, but also "add" and "remove" — because the MorphType field should never, ever be changed.

There's no benefit to having the 2025-09-31 snapshot have MorphTypeData and the 2025-09-30 snapshot not have any.

CrdtMiniLcmApi throws an InvalidOperationException if the update object touches the MorphType field. FwDataMiniLcmApi should do the same; it's already possible for the MorphTypeDataSync.Sync code to throw that exception, but there's a potential code path that, although it would not be hit in practice, could still throw that exception in theory. Let's plug that hole even though it'll likely never be hit.

ResumableImportApi wasn't resuming morph type imports, so the resumable test was having to start over from the beginning on morph types if the random-chance exception triggered during the process. Fixed that, and then discovered that it took 21 attempts to complete the import (so the test stopping after 20 attempts was just one attempt short of success). Bumped the number of attempts up to 30 to provide some headroom.

Instead of modifying historical snapshots to add the AllMorphTypeData property, we need to create a new verified snapshot file with the new data model, leaving the old ones in place so we can continue to verify that older data models don't cause regressions.

Instead of passing all the object's properties as separate params to CreateMorphTypeDataChange, it's much simpler to just pass in the entire object, similarly to how it's done for Entry and other large objects.

Turns out we don't need ON CONFLICT IGNORE since the wrappers we have built around EF Core will handle it for us: any object that gets created in an already-deleted state (as we do for duplicate morph types) will never be sent to the DB in the first place. So it's safe to use EF Core's API for the migration after all, keeping it consistent with how the other migrations are written.

Editing typo that caused far too much head-scratching

Now we get to iterate over the actual enum values, rather than their int equivalents which have to be converted. Which means we can also use the enum names in test strings and so on. Handy.

If an icon is not an immediate child, then its own parent is resonsible for padding. This was making the theme picker ugly.

…ing (#2182)

Co-authored-by: Tim Haasdyk <tim_haasdyk@sil.org>

* Update to latest shadcn themes * Fix broken theme values The contrast of the current values is unacceptable and the themes demo page is using these new values * Improve theme-picker trigger contrast on home page * Adapt ListItem to new theme colors * Fix: dark outline button border color * Adapt ghost and shell inputs to new shadcn theme * Use button styling for sort badge-button * Redesign ListItem states Now it uses "focus-visible:ring-[3px] focus-visible:ring-ring/50" like everything else * Fix broken outlines in dark mode This was only affecting ListItem, which isn't using an outline anymore. So, the issue doesn't currently affect us, but it was VERY confusing. * Improve writing system contrast * Fix Add sense button outline cropped * Improve primary color contrast * Add story for demoing theme colors

rmunn added 21 commits March 5, 2026 14:38

WIP (not working yet): sync morph type data

fdd8a30

CRDT CRUD operations for MorphTypeData are not yet implemented, so this will fail to sync and throw a NotImplementedException if you actually try to sync.

Implement CRUD for MorphTypeData, add DB migration

e3276d3

Update one test to new API shape

04e685d

Register JsonPatchChange for morphTypeData

5f2d632

Update expected data for various snapshot tests

078e9b8

One more data change to a snapshot test

5fbfbe8

Rename CreateMorphTypeData change for consistency

e2a4803

Actually use the change in UseChangesTests

fa075ef

Created the change but forgot to include it in what the method returns.

Teach Resumable test how to create mock morph type

92fbb3a

Test was failing because mock didn't have new API registered so was returning null.

Update expected value after change type rename

4f99bd7

Update regression test data for change type rename

7563c10

Fix another bit of test data

447b91c

Fix live sena-3 sync test

2689909

This one used yet another snapshot file that I hadn't updated yet

Update verified Sena-3 snapshot with morph types

0921309

This time instead of an empty list of morph types, we'll use the actual morph types found in the .fwdata file, which is what the test will actually see.

Finally fix LiveSena3Sync test

62d9e43

Needed both the Snapshot *and* the sqlite3 database to be updated to expect the new MorphTypeData table.

Add missing comma in verified snapshot file

89f63f4

Add unique index on MorphType column

48d4696

Now that the API prevents creating duplicate MorphTypes, we can add a database constraint to enforce that at the DB level.

Update expected VerifyDbModel snapshot

4ffc8e3

Fix AssertSena3Snapshots test on Linux

66c97f6

The backslash character is not a path separator on Linux, so this test was failing to find the snapshot file(s).

github-actions bot added the 💻 FW Lite issues related to the fw lite application, not miniLcm or crdt related label Mar 10, 2026

myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from 13a4e0e to 7723717 Compare March 10, 2026 14:37

rmunn added 3 commits March 11, 2026 08:42

Reject MorphTypeData updates that change MorphType

d0c5348

rmunn and others added 20 commits March 11, 2026 09:18

ResumableTests should also test MorphTypeData

cfa2ecb

Update ResumableTests to check MorphTypeData

df55b0a

Now the MorphTypeData created will be double-checked to ensure it's all there as expected, and it will also create a more realistic 19 MorphTypeData entries instead of just one.

Do not allow ANY JsonPatch changes to MorphType

264f6d6

This rejects not just the "replace" operation, but also "add" and "remove" — because the MorphType field should never, ever be changed.

Address review comments from CodeRabbit

e6a5bb1

Make both historical sena-3 snapshots consistent

b4c082f

There's no benefit to having the 2025-09-31 snapshot have MorphTypeData and the 2025-09-30 snapshot not have any.

CreateMorphTypeDataChange takes object, not params

4979c4f

Instead of passing all the object's properties as separate params to CreateMorphTypeDataChange, it's much simpler to just pass in the entire object, similarly to how it's done for Entry and other large objects.

Fix CreateMorphTypeDataChange missing Abbreviation

ad2a9fd

Editing typo that caused far too much head-scratching

Improve MorphType enumeration in ResumableTests

01da8aa

Now we get to iterate over the actual enum values, rather than their int equivalents which have to be converted. Which means we can also use the enum names in test strings and so on. Handy.

Fix buttons classes to only look for immediate icon children

7283629

If an icon is not an immediate child, then its own parent is resonsible for padding. This was making the theme picker ugly.

Reset install state on dialog toggle and simplify UpdateDialog render…

613cc05

…ing (#2182)

Remove gloss/definition validation when creating entries (#2194)

ab6fdaf

Disable file-logging in Platform.Bible extension (#2196)

83c3317

Co-authored-by: Tim Haasdyk <tim_haasdyk@sil.org>

Attempt to add morph-tokens to headwords

4e2d25d

myieye force-pushed the claude/add-lexeme-headwords-TowRX branch from a5ea68d to 4e2d25d Compare March 13, 2026 15:58

github-actions bot added the 📙 Platform.Bible label Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add pre-computed headwords with morph tokens to Entry model#2202

Add pre-computed headwords with morph tokens to Entry model#2202
myieye wants to merge 44 commits intodevelopfrom
claude/add-lexeme-headwords-TowRX

myieye commented Mar 10, 2026

Uh oh!

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

argos-ci bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

myieye commented Mar 10, 2026

Summary

Key Changes

Notable Implementation Details

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

UI unit Tests

Uh oh!

github-actions bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

C# Unit Tests

Uh oh!

argos-ci bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading

github-actions bot commented Mar 10, 2026 •

edited

Loading

argos-ci bot commented Mar 10, 2026 •

edited

Loading