A high-performance TypeScript library for string similarity and distance algorithms—complete with robust utilities for string cleaning, normalization, and format transformation.
- Fully tree-shakeable
- Zero dependencies
# pnpm
pnpm add text-toolbox
# npm
npm install text-toolbox
# yarn
yarn add text-toolboxlevenshtein()- Calculates the Levenshtein distance between two strings.damerauLevenshtein()- Calculates the Damerau-Levenshtein distance between two strings.diceCoefficient()- Calculates the Dice coefficient between two strings.cosineDistance()- Calculates the cosine distance between two strings.
cosineSimilarity()- Calculates the cosine similarity between two strings.jaroWinkler()- Calculates the Jaro-Winkler similarity between two strings.
metaphoneThree()- Implements the Metaphone 3 algorithm for phonetic encoding.doubleMetaphone()- Implements the Double Metaphone algorithm for phonetic encoding.
A set of helper functions to normalize and sanitize strings before comparison.
Clean up inconsistent or unwanted whitespace:
ensureSpaceAfterPunctuation()- Ensures there is exactly one space after punctuation marks like. , : ; ! ?(if followed by a word character).normalizePunctuationSpacing()- Normalizes spaces around punctuation by removing spaces before, ensuring one space after, optionally removing extra spaces, and allowing custom tight spacing for specific characters (e.g.,-,').normalizeWhitespace()- Trims leading/trailing whitespace and collapses all internal whitespace (spaces, tabs, newlines) to a single space.removeAllWhitespace()- Removes all whitespace characters (spaces, tabs, newlines) from the string.removeExtraSpaces()- Replaces multiple consecutive spaces with a single space (ignores tabs/newlines).removeLeadingWhitespace()- Removes whitespace at the beginning of the string.removeTrailingWhitespace()- Removes whitespace at the end of the string.removeWhitespaceBeforePunctuation()- Removes any whitespace directly before punctuation like. , : ; ! ?.
Remove or replace problematic characters and formatting:
removeCombiningMarks()- Removes all types of combining marks from a string.removeModifiers()- Removes modifier letters and symbols from a string.removeDiacritics()- Removes diacritic marks (accents) from characters in a string.removeHtmlTags()- Strips HTML tags from the input string.removeIllegalCharacters()- Removes illegal or non-printable characters from the string.removeNewLineCharacters()- Removes newline (\n) and carriage return (\r) characters from the string.removeNonASCII()- Removes all non-ASCII characters from the string.removeControlCharacters()- Strips control characters from the string.removePunctuation()- Removes punctuation characters from the string.replaceCompatibilityCharacters()- Normalizes text by converting non-ASCII characters (like æ, ø, ß) to their closest ASCII representation (ae, oe, ss).replaceSmartTypography()- Converts smart typography characters (like curly quotes and em-dashes) to their standard ASCII equivalents.stripEmoji()- Removes all emoji characters from the string.
removeDuplicateWords()- Removes duplicate words that appear consecutively in a string.removeTitlePrefix()- Removes common prefixes from titles (e.g., "The", "A", "An").removeTitleSuffix()- Removes common suffixes from titles.
camelCase()- Converts a string to camel case, where the first word is lowercase and each subsequent word starts with an uppercase letter, with no spaces or punctuation.constantCase()- Converts a string to constant case, where each word is capitalized and separated by underscores.dotCase()- Converts a string to dot case, where each word is lowercase and separated by periods.kebabCase()- Converts a string to kebab case, where each word is lowercase and separated by hyphens.pascalCase()- Converts a string to Pascal case, where each word starts with an uppercase letter and there are no spaces or punctuation.pathCase()- Converts a string to path case, where each word is lowercase and separated by slashes.sentanceCase()- Converts a string to sentence case, where only the first letter of the first word is capitalized.snakeCase()- Converts a string to snake case, where each word is lowercase and separated by underscores.titleCase()- Converts a string to title case, where the first letter of each word is capitalized.
fingerprint()- Normalizes text by removing special characters, creating sorted unique word lists.
normalizeName()- Normalizes names by applying a series of transformations to standardize formatting.
This package is safe :)
MIT