Skip to content

Conversation

@orangejulius
Copy link
Member

This is a simple change to prefer country records over localities when we are deduplicating two matching records.

Note that this doesn't change when deduplication will occur, just which record we prefer when doing so and having identified two records as duplicates.

It mostly affects city-states and small countries like Singapore, Hong-Kong, etc. Some not-quite-city-states like Luxembourg are also affected.

It's a bit of a stylistic change, in these cases either the country or locality is technically correct.

Some things this fixes:

  • Previously, Mexico City and Mexico were both deduplicated to Mexico City. Thus there was nothing you could type in to get Mexico the country. Now, you'll get the country until you type 'Mexico C'. This is still not ideal, we really should not deduplicate those two records, but for now this is at least better.
  • We've found there are some junk locality records out there that match country names, and this conveniently removes all of them.

This is a simple change to prefer country records over localities when
we are deduplicating two matching records.

Note that this doesn't change _when_ deduplication will occur, just
which record we prefer when doing so and having identified two records
as duplicates.

It mostly affects city-states and small countries like Singapore,
Hong-Kong, etc. Some not-quite-city-states like Luxembourg are also
affected.

It's a bit of a stylistic change, in these cases either the country or
locality is _technically_ correct.

Some things this fixes:
- Previously, Mexico City and Meixco were both deduplicated to Mexico
  City. Thus there was nothing you could type in to get Mexico the
  country. Now, you'll get the country until you type 'Mexico C'. This
  is still not ideal, we really should not deduplicate those two
  records, but for now this is at least better.
- We've found there are some junk locality records out there that match
  country names, and this conveniently removes all of them.
@orangejulius orangejulius changed the title feat(dedupe): Prefer country over locality when deduping Prefer country over locality when deduping Dec 3, 2025
@orangejulius orangejulius merged commit 62cd3fe into master Dec 12, 2025
6 checks passed
@orangejulius orangejulius deleted the prefer-country-dedupe branch December 12, 2025 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants