-
Notifications
You must be signed in to change notification settings - Fork 9
feat: handle rate limiting #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideImplements rate-limit (HTTP 429) handling across the Fetcher and SendVisitor paths by introducing configurable default Retry-After behavior, centralizing header parsing, wiring CLI options, and adding tests to validate backoff semantics. Sequence diagram for SendVisitor upload with rate limiting handlingsequenceDiagram
actor User
participant Caller
participant SendVisitor
participant HttpSender
participant ReqwestClient as Reqwest_Client
participant HttpUtils as http_module
participant Backoff as Backoff_Retry
User->>Caller: start_upload(name, body, url)
Caller->>SendVisitor: send(name, body, method, url)
SendVisitor->>Backoff: create_backoff_with_retries(retries)
loop each_attempt
SendVisitor->>SendVisitor: send_once(name, body, method, url)
SendVisitor->>HttpSender: build_request(method, url, body)
HttpSender->>ReqwestClient: execute_request(request)
ReqwestClient-->>SendVisitor: Response
SendVisitor->>HttpUtils: get_retry_after_from_response_header(response, default_retry_after)
alt status_is_429
HttpUtils-->>SendVisitor: Some(retry_after)
SendVisitor-->>Backoff: return Temporary(SendError_RateLimited(retry_after))
else status_not_429
HttpUtils-->>SendVisitor: None
alt success_status
SendVisitor-->>Caller: Ok(())
Caller-->>User: upload_completed
note over SendVisitor: break
else client_or_server_error
SendVisitor-->>Backoff: return Temporary_or_Permanent_error
end
end
Backoff-->>SendVisitor: schedule_next_attempt(dur)
opt notify_hook
Backoff->>SendVisitor: notify(err = Temporary_RateLimited, dur)
alt dur < retry_after
SendVisitor->>SendVisitor: additional = retry_after - dur
SendVisitor->>SendVisitor: std_thread_sleep(additional)
else dur >= retry_after
SendVisitor->>SendVisitor: wait_dur_only
end
end
end
Backoff-->>Caller: propagate_final_error
Caller-->>User: upload_failed
Class diagram for updated HTTP rate limiting componentsclassDiagram
class Fetcher {
+Client client
+usize retries
+Duration default_retry_after
+new(client: Client, options: FetcherOptions) Fetcher
+get(url: Url, processor: FetcherResponseProcessor) Result~T, Error~
+request(method: Method, url: Url, body: Option~Body~, processor: FetcherResponseProcessor) Result~T, Error~
}
class FetcherOptions {
+Duration timeout
+usize retries
+Duration default_retry_after
+new(timeout: Duration) FetcherOptions
+retries(retries: usize) FetcherOptions
+default_retry_after(duration: Duration) FetcherOptions
+Default() FetcherOptions
}
class ClientArguments {
+humantime_Duration timeout
+usize retries
+humantime_Duration default_retry_after
+From_ClientArguments_for_FetcherOptions()
}
class FetcherError {
<<enumeration>>
+Request(reqwest_Error)
+RateLimited(Duration)
}
class SendVisitor {
+HttpSender sender
+usize retries
+Option~Duration~ min_delay
+Option~Duration~ max_delay
+Option~Duration~ default_retry_after
+new(sender: HttpSender) SendVisitor
+retries(retries: usize) SendVisitor
+min_delay(retry_delay: Duration) SendVisitor
+max_delay(retry_delay: Duration) SendVisitor
+default_retry_after(duration: Duration) SendVisitor
+send_once(name: String, body: Bytes, method: Method, url: Url) Result~(), SendOnceError~
+send(name: String, body: Bytes, method: Method, url: Url) Result~(), SendError~
}
class SendError {
<<enumeration>>
+Client(StatusCode)
+Server(StatusCode)
+UnexpectedStatus(StatusCode)
+RateLimited(Duration)
}
class SendOnceError {
<<enumeration>>
+Permanent(SendError)
+Temporary(SendError)
}
class http_module {
<<module>>
+get_retry_after_from_response_header(response: Response, default_duration: Duration) Option~Duration~
-parse_retry_after(value: str) Option~Duration~
}
class BackoffPolicy {
<<external>>
+notify(callback: fn(err, dur))
+retry(policy: BackoffPolicy)
}
Fetcher --> FetcherOptions : constructs_with
Fetcher --> FetcherError : returns
Fetcher --> http_module : uses
FetcherOptions <.. ClientArguments : From_impl
SendVisitor --> SendError : returns
SendVisitor --> SendOnceError : returns
SendVisitor --> http_module : uses
SendOnceError --> SendError : wraps
Fetcher --> BackoffPolicy : uses
SendVisitor --> BackoffPolicy : uses
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Both
FetcherandSendVisitorusestd::thread::sleepinside thenotifycallback, which will block the async runtime; consider replacing this with a non-blockingtokio::time::sleep-based mechanism (e.g., adjusting the backoff delay itself or using an async-aware hook) so retries do not block the entire executor thread. SendVisitor::send_oncecallsget_retry_after_from_response_headerwithself.default_retry_after.unwrap_or_default(), which meansdefault_retry_after: Noneresults in a 0-second delay for 429s; ifNoneis intended to mean "no special rate-limit delay" or "use exponential backoff only", consider handling theOptionexplicitly instead of defaulting toDuration::ZERO.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Both `Fetcher` and `SendVisitor` use `std::thread::sleep` inside the `notify` callback, which will block the async runtime; consider replacing this with a non-blocking `tokio::time::sleep`-based mechanism (e.g., adjusting the backoff delay itself or using an async-aware hook) so retries do not block the entire executor thread.
- `SendVisitor::send_once` calls `get_retry_after_from_response_header` with `self.default_retry_after.unwrap_or_default()`, which means `default_retry_after: None` results in a 0-second delay for 429s; if `None` is intended to mean "no special rate-limit delay" or "use exponential backoff only", consider handling the `Option` explicitly instead of defaulting to `Duration::ZERO`.
## Individual Comments
### Comment 1
<location> `extras/src/visitors/send/mod.rs:190-199` </location>
<code_context>
}
})
.retry(&backoff.with_max_times(retries))
+ .notify(|err, dur| {
+ // If rate limited, ensure we wait at least the Retry-After duration
+ if let Error::RateLimited(retry_after) = err {
+ if dur < *retry_after {
+ log::info!(
+ "Rate limited, extending wait from {:?} to {:?}",
+ dur,
+ retry_after
+ );
+ let additional = *retry_after - dur;
+ std::thread::sleep(additional);
+ }
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** The notify closure pattern-matches on a reference and uses blocking sleep in an async retry loop.
Here `err` is almost certainly a reference (e.g. `&SendOnceError`), so `if let Error::RateLimited(retry_after) = err` will never match as written. You likely need to match on `*err` or use a `&Error::RateLimited(...)` pattern so the rate-limit branch can actually execute.
Also, `std::thread::sleep` inside this async retry callback will block the executor thread and undermines the `tokio::time::sleep`-based backoff. To enforce a minimum delay based on `retry_after`, either incorporate it into the backoff configuration or add an extra `tokio::time::sleep(additional)` instead of a blocking sleep.
</issue_to_address>
### Comment 2
<location> `common/src/fetcher/mod.rs:139-148` </location>
<code_context>
}
})
.retry(&backoff.with_max_times(retries))
+ .notify(|err, dur| {
+ // If rate limited, ensure we wait at least the Retry-After duration
+ if let Error::RateLimited(retry_after) = err {
+ if dur < *retry_after {
+ log::info!(
+ "Rate limited, extending wait from {:?} to {:?}",
+ dur,
+ retry_after
+ );
+ let additional = *retry_after - dur;
+ std::thread::sleep(additional);
+ }
+ }
</code_context>
<issue_to_address>
**issue (bug_risk):** Same notify closure issues: matching on a reference and blocking the async runtime with std::thread::sleep.
Here `err` is almost certainly `&Error`, so `if let Error::RateLimited(retry_after) = err` will never match. You likely want to destructure the inner value instead, e.g. `if let Error::RateLimited(retry_after) = *err` (or `&*err`).
Also, `std::thread::sleep` will block a worker thread in this async path. Since the retry logic already uses async backoff, please enforce `retry_after` using `tokio::time::sleep(additional)` (or by adjusting the backoff) so it cooperates with the async runtime instead of blocking it.
</issue_to_address>
### Comment 3
<location> `extras/src/visitors/send/mod.rs:138` </location>
<code_context>
.await
.map_err(|err| SendOnceError::Temporary(err.into()))?;
+ if let Some(retry_after) = get_retry_after_from_response_header(
+ &response,
+ self.default_retry_after.unwrap_or_default(),
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Using unwrap_or_default() for default_retry_after makes None mean 0 seconds, which may not match the documented semantics.
Because `default_retry_after` is an `Option<Duration>`, passing `self.default_retry_after.unwrap_or_default()` into `get_retry_after_from_response_header` makes `None` behave as `Duration::default()` (0s). That changes the meaning of `None` from "no default" to "retry immediately". In particular, `SendArguments` sets `default_retry_after: None`, so the CLI path effectively disables the 10s default and uses 0s instead.
If `None` is meant to represent "no default retry-after" (or to keep the constructor’s 10s default), consider either:
- not overriding the 10s default in `SendArguments` when the user doesn’t provide a value, or
- changing `get_retry_after_from_response_header` to take `Option<Duration>` and only using a fallback when `Some(default)`.
Otherwise, document clearly that `None` is intended to mean "retry immediately" on 429 without a header.
Suggested implementation:
```rust
if let Some(retry_after) = get_retry_after_from_response_header(
&response,
self.default_retry_after,
) {
```
To fully implement the semantics you described, you will also need to:
1. Update the signature of `get_retry_after_from_response_header` (likely in the same module or a nearby one) from something like:
```rust
fn get_retry_after_from_response_header(
response: &Response<Body>,
default_retry_after: Duration,
) -> Option<Duration> { ... }
```
to:
```rust
fn get_retry_after_from_response_header(
response: &Response<Body>,
default_retry_after: Option<Duration>,
) -> Option<Duration> { ... }
```
2. Inside `get_retry_after_from_response_header`, implement the intended behavior for `None`, for example:
- If the `Retry-After` header is present and valid, use it.
- Else, if `default_retry_after` is `Some(d)`, use `d`.
- Else (when `None`), either:
- Use the constructor’s built-in default (e.g. 10s), or
- Return `None` to mean “no retry-after default”.
3. Adjust any other call sites of `get_retry_after_from_response_header` to pass an `Option<Duration>` instead of a bare `Duration`, or wrap existing `Duration` arguments in `Some(...)`.
These changes will ensure that `None` for `default_retry_after` does not silently become "retry immediately", and instead preserves the documented semantics (“no default” or “use constructor default”).
</issue_to_address>
### Comment 4
<location> `common/tests/fetcher.rs:46-55` </location>
<code_context>
+#[tokio::test]
</code_context>
<issue_to_address>
**suggestion (testing):** Tests depend on real-time sleeps (including a 10s minimum), which will make the test suite very slow and brittle.
In `test_rate_limit_without_retry_after` (and the other timing-based tests), asserting an elapsed time of at least 10 seconds means the test always sleeps that long because it uses real `std::thread::sleep` in the retry hook. Please either reduce the configured wait durations to a few milliseconds or refactor the retry/backoff logic to be injectable so you can mock time instead of relying on wall-clock delays.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
c5bee4d to
b1a2612
Compare
ctron
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this addition. There are some issues to be sorted out though, and room for improvement (which I don't consider blocking). But we have to sort out the sleep, max duration and http date header value.
common/src/fetcher/mod.rs
Outdated
| pub struct FetcherOptions { | ||
| pub timeout: Duration, | ||
| pub retries: usize, | ||
| pub default_retry_after: Duration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, in addition to the default value, we should also have a "max" value. Otherwise it might happen, malicious or not, that someone asks for "wait 1 year", and we would do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we wouldn't expose the "max" value to the user. What could be a reasonable value for "max"? Or better a maximum value for this limit? 5 minutes? Would be long enough as I don't think any API should have a much longer rate limiting. But if it has there are other problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should. You never know what a user wants. But I also think we should offer a reasonable default, like you suggested, 5 mins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we expose MAX to the user, we have to have a limit for that and document it. Otherwise we will have the same problem if the user sets default and max to 1year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I'd say we leave this unlimited. If the user puts in 1 year, that's the user's choice. However, the user is not able to influence the source. So this enables the user to take control of limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's leave it open with 5min as default. I'll add that.
|
@ctron Do you have any ideas how to handle the sleep issue? Another option would be to leave out the backon library and do it manually. I think I found a way with a custom backoff strategy (wrapper). |
|
What speaks against using https://docs.rs/backon/latest/backon/struct.Retry.html#method.adjust ? |
Apparently I am blind, looks like this is what we are looking for, I'll try it. Thanks |
5ea2cdc to
2d78552
Compare
2d78552 to
900c683
Compare
|
This looks great. Thanks! |
|
This may be a breaking change, and the CI might complain about this. Just uptick the version number, it's worth it. |
Happy to help :) |
resolves #70