Bump chardet from 5.2.0 to 7.4.0.post2 by dependabot[bot] · Pull Request #1662 · cms-dev/cms

dependabot · 2026-03-30T15:00:07Z

Bumps chardet from 5.2.0 to 7.4.0.post2.

Release notes

chardet 7.4.0 brings accuracy up to 99.3% (from 98.6% in 7.3.0) and significantly faster cold start thanks to a new dense model format.

What's New

Performance:

New dense zlib-compressed model format (v2) drops cold start (import + first detect) from ~75ms to ~13ms with mypyc

Accuracy (98.6% → 99.3%):

Eliminated train/test data overlap via content fingerprinting

Added MADLAD-400 and Wikipedia as supplemental training sources

Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training and weighted by per-bigram IDF

Encoding-aware substitution filtering (substitutions only apply for characters the target encoding can't represent)

Increased training samples from 15K to 25K per language/encoding pair

Bug fixes:

Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS (these were previously sharing their base encoding's byte-range analyzer, missing extended ranges)

Metrics

chardet 7.4.0 (mypyc) chardet 6.0.0 charset-normalizer 3.4.6

Accuracy (2,517 files) 99.3% 88.2% 85.4%

Speed 551 files/s 12 files/s 376 files/s

Language detection 95.7% 40.0% 59.2%

Full changelog: https://chardet.readthedocs.io/en/latest/changelog.html

7.3.0

License

0BSD license — the project license has been changed from MIT to 0BSD, a maximally permissive license with no attribution requirement. All prior 7.x releases should also be considered 0BSD licensed as of this release.

Features

Added mime_type field to detection results — identifies file types for both binary (via magic number matching) and text content. Returned in all detect(), detect_all(), and UniversalDetector results. (#350)

New pipeline/magic.py module detects 40+ binary file formats including images, audio/video, archives, documents, executables, and fonts. ZIP-based formats (XLSX, DOCX, JAR, APK, EPUB, wheel, OpenDocument) are distinguished by entry filenames. (#350)

Bug Fixes

Fixed incorrect equivalence between UTF-16-LE and UTF-16-BE in accuracy testing — these are distinct encodings with different byte order, not interchangeable

Performance

Added 4 new modules to mypyc compilation (orchestrator, confusion, magic, ascii), bringing the total to 11 compiled modules

Capped statistical scoring at 16 KB — bigram models converge quickly, so large files no longer score the full 200 KB. Worst-case detection time dropped from 62ms to 26ms with no accuracy loss.

Replaced dataclasses.replace() with direct DetectionResult construction on hot paths, eliminating ~354k function calls per full test suite run

Build

... (truncated)

Changelog

Sourced from chardet's changelog.

Changelog

.. note::

Entries marked "via Claude" were developed with Claude Code <https://claude.ai/code>_. Dan directed the design, reviewed all output, and takes responsibility for the result. Unmarked entries by Dan were written without AI assistance.

7.4.0 (2026-03-26)

Performance:

Switched to dense zlib-compressed model format (v2): models are now stored as contiguous memoryview slices of a single decompressed blob, eliminating per-model struct.unpack overhead. Cold start (import + first detect) dropped from ~75ms to ~13ms with mypyc. (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude, [#354](https://github.com/chardet/chardet/issues/354) <https://github.com/chardet/chardet/pull/354>_)

Accuracy:

Accuracy improved from 98.6% to 99.3% (2499/2517 files) through a combination of training and scoring improvements:

Eliminated train/test data overlap by content-fingerprinting test suite articles and excluding them from training data ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)

Added MADLAD-400 and Wikipedia as supplemental training sources to fill gaps left by exclusion filtering ([#351](https://github.com/chardet/chardet/issues/351) <https://github.com/chardet/chardet/pull/351>_)

Improved non-ASCII bigram scoring: high-byte bigrams are now preserved during training (instead of being crushed by global normalization), and weighted by per-bigram IDF so encoding-specific byte patterns contribute proportionally to how discriminative they are ([#352](https://github.com/chardet/chardet/issues/352) <https://github.com/chardet/chardet/pull/352>_)

Added encoding-aware substitution filtering: character substitutions during training now only apply for characters the target encoding cannot represent

Increased training samples from 15K to 25K per language/encoding pair (Dan Blanchard <https://github.com/dan-blanchard>_ via Claude)

Bug Fixes:

Added dedicated structural analyzers for CP932, CP949, and Big5-HKSCS: these superset encodings previously shared their base encoding's byte-range analyzer, missing extended ranges unique to each superset

... (truncated)

Commits

e37cf3c fix: prevent dirty-tree version in Windows mypyc wheel builds
f9f5af2 Fix a couple errors in the changelog
53755de chore: add .superpowers/ to .gitignore
3a20df6 docs: update README examples with correct outputs
17ec933 docs: note train/test separation in performance.rst
7296d93 docs: add footnote explaining 0 B import memory (lazy loading)
4221536 docs: add chardet 7.0.1-7.3.0 to historical performance table
9dfb65b docs: remove historical performance table spec/plan (preserved in git history)
72413b8 docs: add historical performance table to performance.rst
cbcb80d fix: handle None detect() results and missing venvs in benchmarks
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR
@dependabot recreate will recreate this PR, overwriting any edits that have been made to it
@dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [chardet](https://github.com/chardet/chardet) from 5.2.0 to 7.4.0.post2. - [Release notes](https://github.com/chardet/chardet/releases) - [Changelog](https://github.com/chardet/chardet/blob/main/docs/changelog.rst) - [Commits](chardet/chardet@5.2.0...7.4.0.post2) --- updated-dependencies: - dependency-name: chardet dependency-version: 7.4.0.post2 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

codecov · 2026-03-30T15:06:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 54.73%. Comparing base (520ab66) to head (35fce62).
✅ All tests successful. No failed tests found.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1662   +/-   ##
=======================================
  Coverage   54.73%   54.73%           
=======================================
  Files         335      335           
  Lines       27401    27401           
=======================================
+ Hits        14997    14998    +1     
+ Misses      12404    12403    -1

Flag	Coverage Δ
functionaltests	`0.00% <ø> (ø)`
unittests	`54.73% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Mar 30, 2026

dependabot bot mentioned this pull request Mar 30, 2026

Bump chardet from 5.2.0 to 7.4.0.post1 #1659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump chardet from 5.2.0 to 7.4.0.post2#1662

Bump chardet from 5.2.0 to 7.4.0.post2#1662
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/pip/chardet-7.4.0.post2

dependabot bot commented on behalf of github Mar 30, 2026

Uh oh!

codecov bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

0 participants

	chardet 7.4.0 (mypyc)	chardet 6.0.0	charset-normalizer 3.4.6
Accuracy (2,517 files)	99.3%	88.2%	85.4%
Speed	551 files/s	12 files/s	376 files/s
Language detection	95.7%	40.0%	59.2%

Conversation

dependabot bot commented on behalf of github Mar 30, 2026

What's New

Metrics

7.3.0

License

Features

Bug Fixes

Performance

Build

Changelog

7.4.0 (2026-03-26)

Uh oh!

codecov bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

0 participants

codecov bot commented Mar 30, 2026 •

edited

Loading