Add pre-computed headwords with morph tokens to Entry model#2202
Add pre-computed headwords with morph tokens to Entry model#2202
Conversation
CRDT CRUD operations for MorphTypeData are not yet implemented, so this will fail to sync and throw a NotImplementedException if you actually try to sync.
In liblcm, the .Prefix and .Postfix properties can be (and often are) null, so the corresponding LeadingToken and TrailingToken properties in MorphTypeData need to be nullable as well. The UI will have to do ?? "" in appropriate places as it displays the leading/trailing tokens.
Created the change but forgot to include it in what the method returns.
Test was failing because mock didn't have new API registered so was returning null.
This one used yet another snapshot file that I hadn't updated yet
This time instead of an empty list of morph types, we'll use the actual morph types found in the .fwdata file, which is what the test will actually see.
Needed both the Snapshot *and* the sqlite3 database to be updated to expect the new MorphTypeData table.
Check in two places: the API, which is the most likely place to catch duplicates, but also in the Change object itself, in case two people somehow end up submitting the same change twice (e.g., one person was offline for a period of time).
Now that the API prevents creating duplicate MorphTypes, we can add a database constraint to enforce that at the DB level.
The backslash character is not a path separator on Linux, so this test was failing to find the snapshot file(s).
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
13a4e0e to
7723717
Compare
|
The latest updates on your projects. Learn more about Argos notifications ↗︎ Awaiting the start of a new Argos build… |
Since EF Core doesn't support the ON CONFLICT IGNORE syntax from Sqlite, but we need that syntax for the constraint here, we must hand-write the SQL rather than let EF Core generate it for us. Thankfully we can let EF Core generate the migration, look at the resulting table in Sqlite, and copy that CREATE TABLE syntax with just minor changes, so the actual hand-writing is minimal since most of it is written for us already.
UpdateMorphTypeData(before, after) could have potentially applied some changes that UpdateMorphTypeData(id, updateDocument) would have rejected. This keeps their behaviors consistent by rejecting any change that would have tried to change the MorphType enum.
Possible race condition might exist between two processes creating MorphTypeData entries with the same MorphType value; vanishingly rare but theoretically possible. Because the Sqlite table has ON CONFLICT IGNORE, the second Create call will "lose" and there will not be a database entry with that Id. But there *will* be a database entry with the correct MorphType value, so that one is guaranteed to succeed and return an actual record.
Now the MorphTypeData created will be double-checked to ensure it's all there as expected, and it will also create a more realistic 19 MorphTypeData entries instead of just one.
With ten MorphTypeData objects being synced as well, the odds of failure go up just a little bit more, so that with the deterministic seed we've chosen, 20 attempts are not quite enough to succeed — but it will succeed on attempt number 23.
This rejects not just the "replace" operation, but also "add" and "remove" — because the MorphType field should never, ever be changed.
There's no benefit to having the 2025-09-31 snapshot have MorphTypeData and the 2025-09-30 snapshot not have any.
CrdtMiniLcmApi throws an InvalidOperationException if the update object touches the MorphType field. FwDataMiniLcmApi should do the same; it's already possible for the MorphTypeDataSync.Sync code to throw that exception, but there's a potential code path that, although it would not be hit in practice, could still throw that exception in theory. Let's plug that hole even though it'll likely never be hit.
ResumableImportApi wasn't resuming morph type imports, so the resumable test was having to start over from the beginning on morph types if the random-chance exception triggered during the process. Fixed that, and then discovered that it took 21 attempts to complete the import (so the test stopping after 20 attempts was just one attempt short of success). Bumped the number of attempts up to 30 to provide some headroom.
Instead of modifying historical snapshots to add the AllMorphTypeData property, we need to create a new verified snapshot file with the new data model, leaving the old ones in place so we can continue to verify that older data models don't cause regressions.
Instead of passing all the object's properties as separate params to CreateMorphTypeDataChange, it's much simpler to just pass in the entire object, similarly to how it's done for Entry and other large objects.
Turns out we don't need ON CONFLICT IGNORE since the wrappers we have built around EF Core will handle it for us: any object that gets created in an already-deleted state (as we do for duplicate morph types) will never be sent to the DB in the first place. So it's safe to use EF Core's API for the migration after all, keeping it consistent with how the other migrations are written.
Editing typo that caused far too much head-scratching
Now we get to iterate over the actual enum values, rather than their int equivalents which have to be converted. Which means we can also use the enum names in test strings and so on. Handy.
If an icon is not an immediate child, then its own parent is resonsible for padding. This was making the theme picker ugly.
Co-authored-by: Tim Haasdyk <tim_haasdyk@sil.org>
* Update to latest shadcn themes * Fix broken theme values The contrast of the current values is unacceptable and the themes demo page is using these new values * Improve theme-picker trigger contrast on home page * Adapt ListItem to new theme colors * Fix: dark outline button border color * Adapt ghost and shell inputs to new shadcn theme * Use button styling for sort badge-button * Redesign ListItem states Now it uses "focus-visible:ring-[3px] focus-visible:ring-ring/50" like everything else * Fix broken outlines in dark mode This was only affecting ListItem, which isn't using an outline anymore. So, the issue doesn't currently affect us, but it was VERY confusing. * Improve writing system contrast * Fix Add sense button outline cropped * Improve primary color contrast * Add story for demoing theme colors
a5ea68d to
4e2d25d
Compare
Summary
This PR adds a pre-computed
Headwordproperty to theEntrymodel that includes morphological tokens (leading/trailing affixes) applied to lexeme forms. The headword is computed during entry loading and made available across all writing systems, enabling better search and display functionality.Key Changes
Entry Model Enhancement: Added
Headwordproperty toEntryas aMultiStringcontaining pre-computed headwords for all writing systems with morph tokens appliedLeadingToken + LexemeForm + TrailingTokenHeadword Computation:
ComputeHeadwords()method inEntryQueryHelpersfor in-memory computationHeadwordWithTokens()andHeadwordSearchValue()expression methods for SQL translationQueryHelpers.Finalize()Search Service Updates:
EntrySearchService.Filter()to acceptWritingSystemIdparameterToEntrySearchRecord()to include headwords for all vernacular writing systems (space-separated)Data Bridge Integration:
FwDataMiniLcmApi.FromLexEntry()to compute headwords using LCM morph type dataComputeHeadword()helper that mirrors the CRDT computation logicAPI Updates:
Entry.Headword()method toHeadwordText()for clarity (returns first non-empty headword for logging/display)Headwordto CRDT model configuration as ignored property (not persisted)Frontend Integration:
headwordproperty inIEntrywriting-system-serviceto prefer pre-computed headword with fallback to raw formsTesting:
HeadwordSearchValueTestscovering token matching, citation form priority, and multi-WS scenariosNotable Implementation Details
Headwordproperty is computed in-memory during entry loading and not persisted to the databaseMorphTypeDatafrom CRDT once it's implemented as a CRDT entityhttps://claude.ai/code/session_01GFNCNDE5wHE2hGC7pQQp2f