gh-152847: Reject non-digit day-of-year in pure-Python zoneinfo POSIX TZ rules#152848
Open
tonghuaroot wants to merge 2 commits into
Open
gh-152847: Reject non-digit day-of-year in pure-Python zoneinfo POSIX TZ rules#152848tonghuaroot wants to merge 2 commits into
tonghuaroot wants to merge 2 commits into
Conversation
… POSIX TZ rules The J and n day-of-year branches of _parse_dst_start_end() fell through to a bare int(date), accepting input the C accelerator rejects (for example J1_0, which int() reads as day 10, silently building a different zone). Guard the branch with an re.ASCII digit match mirroring the C parser's parse_digits(1, 3), so both implementations agree.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The pure-Python
zoneinfoparser validated theMm.w.dtransition rulestrictly, but the
Jn(Julian) andn(0-based) day-of-year branches of_parse_dst_start_end()fell through to a bareint(date)with no formatcheck.
int()accepts input the C accelerator rejects, so the twoimplementations disagreed on the same POSIX TZ string.
The C accelerator reads the day-of-year field with
parse_digits(&ptr, 1, 3, &day)inModules/_zoneinfo.c, consuming 1 to 3ASCII digits (
Py_ISDIGIT) and nothing else. This PR adds the matching guardbefore
int(), so the pure parser now matches the C accelerator exactly forthis field — not stricter, not looser.
The most notable case is a silent miscompile rather than a crash:
int('1_0')is
10, soAAA4BBB,J1_0,J300/2previously built a valid but different zone(DST on day 10) in pure Python while the C accelerator raised
ValueError.The fix also aligns the
J+1, leading-space, 4+-digit-width (J0001), andnon-ASCII-digit cases. The
1-to-3-digit bound is deliberate: the C parserconsumes at most three digits, so
\d+would make the pure parser acceptJ0001, which C rejects. Leading zeros within three digits (J01,J001)are still accepted by both. The existing
_DayOffsetrange check(
[julian, 365]) is untouched, so no numeric-range behaviour changes.Verified with a C-vs-pure differential (10 divergent inputs before, 0 after),
a zero-regression pass over all 499 bundled IANA zones (byte-identical through
both implementations), and the full
test_zoneinfosuite. Added invalid-TZcases (
J1_0,1_0,J+1,J 1,1,J0001,0001) and leading-zerovalid controls (
J001,001); these run against bothTZStrTestandCTZStrTest.This covers a distinct field in the same POSIX TZ parity audit as gh-152212
(std offset), gh-152246 (
Mm.w.dseparator), and gh-152248 (abbreviationregex).
Jn/nday-of-year field accepts non-digit input viaint()(C rejects) #152847