Skip to content

gh-152847: Reject non-digit day-of-year in pure-Python zoneinfo POSIX TZ rules#152848

Open
tonghuaroot wants to merge 2 commits into
python:mainfrom
tonghuaroot:fix-gh-152847-zoneinfo-doy
Open

gh-152847: Reject non-digit day-of-year in pure-Python zoneinfo POSIX TZ rules#152848
tonghuaroot wants to merge 2 commits into
python:mainfrom
tonghuaroot:fix-gh-152847-zoneinfo-doy

Conversation

@tonghuaroot

@tonghuaroot tonghuaroot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

The pure-Python zoneinfo parser validated the Mm.w.d transition rule
strictly, but the Jn (Julian) and n (0-based) day-of-year branches of
_parse_dst_start_end() fell through to a bare int(date) with no format
check. int() accepts input the C accelerator rejects, so the two
implementations disagreed on the same POSIX TZ string.

The C accelerator reads the day-of-year field with
parse_digits(&ptr, 1, 3, &day) in Modules/_zoneinfo.c, consuming 1 to 3
ASCII digits (Py_ISDIGIT) and nothing else. This PR adds the matching guard

if re.fullmatch(r"\d{1,3}", date, re.ASCII) is None:
    raise ValueError(f"Invalid dst start/end date: {dststr}")

before int(), so the pure parser now matches the C accelerator exactly for
this field — not stricter, not looser.

The most notable case is a silent miscompile rather than a crash: int('1_0')
is 10, so AAA4BBB,J1_0,J300/2 previously built a valid but different zone
(DST on day 10) in pure Python while the C accelerator raised ValueError.
The fix also aligns the J+1, leading-space, 4+-digit-width (J0001), and
non-ASCII-digit cases. The 1-to-3-digit bound is deliberate: the C parser
consumes at most three digits, so \d+ would make the pure parser accept
J0001, which C rejects. Leading zeros within three digits (J01, J001)
are still accepted by both. The existing _DayOffset range check
([julian, 365]) is untouched, so no numeric-range behaviour changes.

Verified with a C-vs-pure differential (10 divergent inputs before, 0 after),
a zero-regression pass over all 499 bundled IANA zones (byte-identical through
both implementations), and the full test_zoneinfo suite. Added invalid-TZ
cases (J1_0, 1_0, J+1, J 1, 1, J0001, 0001) and leading-zero
valid controls (J001, 001); these run against both TZStrTest and
CTZStrTest.

This covers a distinct field in the same POSIX TZ parity audit as gh-152212
(std offset), gh-152246 (Mm.w.d separator), and gh-152248 (abbreviation
regex).

… POSIX TZ rules

The J and n day-of-year branches of _parse_dst_start_end() fell through to a
bare int(date), accepting input the C accelerator rejects (for example J1_0,
which int() reads as day 10, silently building a different zone). Guard the
branch with an re.ASCII digit match mirroring the C parser's
parse_digits(1, 3), so both implementations agree.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant