Skip to content

Conversation

@vaidas-shopify
Copy link
Owner

Thanks for taking the time to contribute to Git! Please be advised that the
Git community does not use github.com for their contributions. Instead, we use
a mailing list ([email protected]) for code submissions, code reviews, and
bug reports. Nevertheless, you can use GitGitGadget (https://gitgitgadget.github.io/)
to conveniently send your Pull Requests commits to our mailing list.

For a single-commit pull request, please leave the pull request description
empty
: your commit message itself should describe your changes.

Please read the "guidelines for contributing" linked above!

Add retry logic for HTTP 429 (Too Many Requests) responses to handle
server-side rate limiting gracefully. When Git's HTTP client receives
a 429 response, it can now automatically retry the request after an
appropriate delay, respecting the server's rate limits.

The implementation supports the RFC-compliant Retry-After header in
both delay-seconds (integer) and HTTP-date (RFC 2822) formats. If a
past date is provided, Git retries immediately without waiting.

Retry behavior is controlled by three new configuration options:

  * http.maxRetries: Maximum number of retry attempts (default: 0,
    meaning retries are disabled by default). Users must explicitly
    opt-in to retry behavior.

  * http.retryAfter: Default delay in seconds when the server doesn't
    provide a Retry-After header (default: -1, meaning fail if no
    header is provided). This serves as a fallback mechanism.

  * http.maxRetryTime: Maximum delay in seconds for a single retry
    (default: 300). If the server requests a delay exceeding this
    limit, Git fails immediately rather than waiting. This prevents
    indefinite blocking on unreasonable server requests.

All three options can be overridden via environment variables:
GIT_HTTP_MAX_RETRIES, GIT_HTTP_RETRY_AFTER, and
GIT_HTTP_MAX_RETRY_TIME.

The retry logic implements a fail-fast approach: if any delay
(whether from server header or configuration) exceeds maxRetryTime,
Git fails immediately with a clear error message rather than capping
the delay. This provides better visibility into rate limiting issues.

The implementation includes extensive test coverage for basic retry
behavior, Retry-After header formats (integer and HTTP-date),
configuration combinations, maxRetryTime limits, invalid header
handling, environment variable overrides, and edge cases.

Signed-off-by: Vaidas Pilkauskas <[email protected]>
Fix a memory leak in show_http_message() that was triggered when
displaying HTTP error messages before die(). The function would call
strbuf_reencode() which modifies the caller's strbuf in place,
allocating new memory for the re-encoded string. Since this function
is only called immediately before die(), the allocated memory was
never explicitly freed, causing leak detectors to report it.

The leak became visible when HTTP 429 rate limit retry support was
added, which introduced the HTTP_RATE_LIMITED error case. However,
the issue existed in pre-existing error paths as well
(HTTP_MISSING_TARGET, HTTP_NOAUTH, HTTP_NOMATCHPUBLICKEY) - the new
retry logic just made it more visible in tests because retries
exercise the error paths more frequently.

The leak was detected by LeakSanitizer in t5584 tests that enable
retries (maxRetries > 0). Tests with retries disabled passed because
they took a different code path or timing.

Fix this by making show_http_message() work on a local copy of the
message buffer instead of modifying the caller's buffer in place:

1. Create a local strbuf and copy the message into it
2. Perform re-encoding on the local copy if needed
3. Display the message from the local copy
4. Properly release the local copy before returning

This ensures all memory allocated by strbuf_reencode() is freed
before the function returns, even though die() is called immediately
after, eliminating the leak.

Signed-off-by: Vaidas Pilkauskas <[email protected]>
Add trace2 instrumentation to HTTP 429 retry operations to enable
monitoring and debugging of rate limit scenarios in production
environments.

The trace2 logging captures:

  * Retry attempt numbers (http/429-retry-attempt) to track retry
    progression and identify how many attempts were needed

  * Retry-After header values (http/429-retry-after) from server
    responses to understand server-requested delays

  * Actual sleep durations (http/retry-sleep-seconds) within trace2
    regions (http/retry-sleep) to measure time spent waiting

  * Error conditions (http/429-error) such as "retries-exhausted",
    "exceeds-max-retry-time", "no-retry-after-config", and
    "config-exceeds-max-retry-time" for diagnosing failures

  * Retry source (http/429-retry-source) indicating whether delay
    came from server header or config default

This instrumentation provides complete visibility into retry behavior,
enabling operators to monitor rate limiting patterns, diagnose retry
failures, and optimize retry configuration based on real-world data.

Signed-off-by: Vaidas Pilkauskas <[email protected]>
@vaidas-shopify vaidas-shopify force-pushed the retry-after-review-1-draft branch from 7803c38 to ff0dd0a Compare December 15, 2025 11:46
@vaidas-shopify vaidas-shopify force-pushed the retry-after-review-1-draft branch from ff0dd0a to 869d97c Compare December 15, 2025 11:51
@vaidas-shopify vaidas-shopify force-pushed the retry-after-review-1-draft branch from 02046d7 to df8e052 Compare December 16, 2025 11:46
@vaidas-shopify vaidas-shopify force-pushed the retry-after-review-1-draft branch from df8e052 to 36bad7a Compare December 16, 2025 13:33
vaidas-shopify pushed a commit that referenced this pull request Dec 18, 2025
When pushing to a set of remotes using a nickname for the group, the
client initializes the connection to each remote, talks to the
remote and reads and parses capabilities line, and holds the
capabilities in a file-scope static variable server_capabilities_v1.

There are a few other such file-scope static variables, and these
connections cannot be parallelized until they are refactored to a
structure that keeps track of active connections.

Which is *not* the theme of this patch ;-)

For a single connection, the server_capabilities_v1 variable is
initialized to NULL (at the program initialization), populated when
we talk to the other side, used to look up capabilities of the other
side possibly multiple times, and the memory is held by the variable
until program exit, without leaking.  When talking to multiple remotes,
however, the server capabilities from the second connection overwrites
without freeing the one from the first connection, which leaks.

    ==1080970==ERROR: LeakSanitizer: detected memory leaks

    Direct leak of 421 byte(s) in 2 object(s) allocated from:
	#0 0x5615305f849e in strdup (/home/gitster/g/git-jch/bin/bin/git+0x2b349e) (BuildId: 54d149994c9e85374831958f694bd0aa3b8b1e26)
	#1 0x561530e76cc4 in xstrdup /home/gitster/w/build/wrapper.c:43:14
	#2 0x5615309cd7fa in process_capabilities /home/gitster/w/build/connect.c:243:27
	#3 0x5615309cd502 in get_remote_heads /home/gitster/w/build/connect.c:366:4
	#4 0x561530e2cb0b in handshake /home/gitster/w/build/transport.c:372:3
	#5 0x561530e29ed7 in get_refs_via_connect /home/gitster/w/build/transport.c:398:9
	git#6 0x561530e26464 in transport_push /home/gitster/w/build/transport.c:1421:16
	#7 0x561530800bec in push_with_options /home/gitster/w/build/builtin/push.c:387:8
	git#8 0x5615307ffb99 in do_push /home/gitster/w/build/builtin/push.c:442:7
	git#9 0x5615307fe926 in cmd_push /home/gitster/w/build/builtin/push.c:664:7
	git#10 0x56153065673f in run_builtin /home/gitster/w/build/git.c:506:11
	git#11 0x56153065342f in handle_builtin /home/gitster/w/build/git.c:779:9
	git#12 0x561530655b89 in run_argv /home/gitster/w/build/git.c:862:4
	git#13 0x561530652cba in cmd_main /home/gitster/w/build/git.c:984:19
	git#14 0x5615308dda0a in main /home/gitster/w/build/common-main.c:9:11
	git#15 0x7f051651bca7 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16

    SUMMARY: AddressSanitizer: 421 byte(s) leaked in 2 allocation(s).

Free the capablities data for the previous server before overwriting
it with the next server to plug this leak.

The added test fails without the freeing with SANITIZE=leak; I
somehow couldn't get it fail reliably with SANITIZE=leak,address
though.

Signed-off-by: Junio C Hamano <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants