-
Notifications
You must be signed in to change notification settings - Fork 16
fix(cli): retry mechanism when multiple systems query the MAPI #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the MAPI client's ability to handle concurrent requests from multiple systems using the same access token by adjusting retry and batching configurations. The changes reduce the likelihood of overwhelming the API when multiple systems query simultaneously.
Key Changes:
- Increased retry attempts from 3 to 12 with linear backoff strategy
- Reduced batch sizes from 100 to 6 in CLI migration commands
- Updated documentation and tests to reflect new retry configuration
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/mapi-client/src/client/client.ts | Increased maxRetries to 12 and implemented linear backoff for 429 responses |
| packages/mapi-client/src/tests/integration.test.ts | Updated test expectations to match new retry count (13 total calls) |
| packages/mapi-client/README.md | Updated documentation to reflect new maxRetries value of 12 |
| packages/cli/src/commands/migrations/run/index.ts | Reduced batchSize from 100 to 6 for fetching and updating stories |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| starts_with: startsWith, | ||
| }, | ||
| batchSize: 100, | ||
| batchSize: 6, |
Copilot
AI
Oct 21, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Reducing batch size from 100 to 6 is a significant decrease that may substantially increase the total number of API requests and overall execution time. Consider using a more moderate value (e.g., 20-30) to balance between avoiding rate limits and maintaining reasonable performance.
| batchSize: 6, | |
| batchSize: 20, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maoberlehner what do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did go down the rabbit hole of batch sizes, exponential backoffs, and jitter. I arrived at a little bit more sophisticated delay mechanism and a batchSize of 12. The higher the batchSize, the higher the maxRetries must be, and I think 12 retries already is quite something. With a batchSize >= 20, I quickly run into the maxRetries limit when I start two or more migrations in parallel. With the new settings, 4 parallel migrations are still stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This brings the MAPI client and CLI reliance to a next level, awesome!
eb28875 to
d737058
Compare
@storyblok/astro
storyblok
@storyblok/eslint-config
@storyblok/js
storyblok-js-client
@storyblok/management-api-client
@storyblok/nuxt
@storyblok/react
@storyblok/region-helper
@storyblok/richtext
@storyblok/svelte
@storyblok/vue
commit: |
alexjoverm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Seems that the CI/CD is failing though, but other than that looks great
Also pinged you re the batch size, but not a blocker at all.
d737058 to
b6e543c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
b6e543c to
7b25716
Compare
| * @param maxDelay The maximum possible delay in milliseconds (e.g., 20000). | ||
| * @returns The calculated delay in milliseconds to wait before the next attempt. | ||
| */ | ||
| export function calculateRetryDelay( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: As this logic already lives in the MAPI client, and the CLI is leveraging it, is it needed to repeat this logic at this level? (same with delay.ts)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's not using the client for the login: https://github.com/storyblok/monoblok/blob/bugfix/cli-migrations-retry/packages/cli/src/commands/login/actions.ts#L38
I don't like the repetition either, but I did want to use the same algo in both cases, yet I don't think it is warranted to introduce a new package for shared code just for that. But maybe we should think about it for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure why we don't use the client for login. Probably it would be a better fix to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Would you say it's crucial for the login? As it's not a bulk operation, I'm thinking if we can leave it as it is and without the retry algo (to avoid the duplication), and in a separate ticket we can review the login/logout calls, to keep it out of the scope of this one.
What do you think? I'm also fine leaving the duplication temporarily until we review them, if you think it's better solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, changed back.
When multiple systems hit the MAPI with the same access token, the retry mechanisms quickly gave up. By decreasing the semaphore batch size and increasing the max retries, we make such situations less likely. However, it is still possible to run into limits when many systems query the MAPI with the same access token in parallel. See the following article for the rational behind the full jitter retry delay: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/ Fixes WDX-184
7b25716 to
d59c972
Compare
alexjoverm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go, thank you very much!
When multiple systems hit the MAPI with the same access token, the retry mechanisms quickly gave up. By decreasing the semaphore batch size and increasing the max retries, we make such situations less likely. However, it is still possible to run into limits when many systems query the MAPI with the same access token in parallel.
See the following article for the rational behind the full jitter retry delay: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/