[SDCI-2079] Document automatic job retries (Preview) - hybrid approach#36147
[SDCI-2079] Document automatic job retries (Preview) - hybrid approach#36147
Conversation
Adds customer-facing documentation for the automatic job retries feature on GitHub Actions and GitLab. Uses a dedicated automatic_retries.md page as the source of truth, surfaced through the compatibility tables on each provider page and the supported features matrix on the pipelines index.
Preview links (active after the
|
- Replace "hiccups" colloquialism with "failures". - Split long em-dashed sentence in Overview into two sentences. - Replace passive "the retry outcome is reflected" with active. - Replace "configurable maximum" with "maximum number of attempts" since the limit is not customer-tunable today. - Replace "when the failure is determined retriable" with "when the failure is identified as retriable". - Replace quoted "rerun failed jobs" with plain prose (GitHub API call). - Replace awkward "compute minutes consumed by your pipelines" with "GitHub Actions compute usage". - Add missing "as" in "aren't classified retriable" (GitLab tab). - Make GitLab provider list items structurally consistent (all full sentences). - Rename "Provider support" heading to "Provider-specific behavior" for stronger AI retrieval. - Add GitHub Actions and GitLab setup pages to further_reading. - Replace em dash with period in access-gating sentence.
Editorial review (post-fix round)No build-breakers. One must-fix plus a few polish suggestions. Higher-level
Must fix
Suggestions
Link verification
VerdictComment — one must-fix (Requirements list), rest are polish. The previous round of fixes addressed the bigger issues well. |
- Collapse Requirements bullet 3 into a single fragment; redirect readers to the banner instead of repeating access-request instructions. - Replace "Genuine code defects are left alone" with "not retried" to avoid the idiom. - Replace ambiguous "This reduces the number of pipelines developers manually re-run" with "Automatic retries reduce the number of pipelines that developers re-run by hand". - GitLab tab: replace awkward "as soon as the job finishes failing" with "as soon as the job fails", and drop the redundant second "with" in the Smart retries bullet.
Automatic retries use the same AI error classifier as CI jobs failure analysis, which reads indexed CI job logs to decide whether a failure is transient. Adds the log collection dependency to the Requirements list with provider-specific setup links, plus a cross-reference to the failure analysis guide.
|
|
||
| ## Limitations | ||
|
|
||
| - Each logical job is retried at most one time. |
There was a problem hiding this comment.
it's a bit confusing to me what does logical job stands for here, maybe you mean:
| - Each logical job is retried at most one time. | |
| - Each failed job is retried at most once. |
?
There was a problem hiding this comment.
this means the same job id, maybe we should mention that or remove this line at all to avoid confusion, WDYT?
|
|
||
| ### Protected branches | ||
|
|
||
| The Datadog GitHub App's default permissions do not allow retries on protected branches. To enable automatic retries on a protected branch (for example, your default branch), grant the app Maintainer-level access. Review your organization's policies before expanding permissions. |
There was a problem hiding this comment.
Question here for a detail, is this org-wide or repo-wide setting for the Maintainer-level access?
There was a problem hiding this comment.
this is per org AFAIK
| 1. A CI job fails in your pipeline. | ||
| 2. Datadog's AI error classifier inspects the job's logs and error context to determine whether the failure is transient. | ||
| 3. If the failure is classified as retriable, Datadog requests a retry through the provider's API. | ||
| 4. Datadog retries each job up to a maximum number of attempts to prevent infinite retry loops. |
There was a problem hiding this comment.
why here we say up to a max num but then later we specify at most once?
There was a problem hiding this comment.
at most one the same job id, but the same job fingerprint is retried the N times, we don't have the settings yet, but once we have them we should update this doc
What does this PR do? What is the motivation?
Fixes SDCI-2079
Adds customer-facing documentation for the automatic job retries feature on GitHub Actions and GitLab. The feature is in Preview.
Approach A — Hybrid (dedicated page + compat-table rows on each provider). This is one of three variant PRs opened to compare documentation structures.
Changes:
content/en/continuous_integration/pipelines/automatic_retries.md— single source of truth with provider tabs.github.mdandgitlab.md— new row in the Compatibility table linking to the dedicated page.pipelines/_index.md— new row in the Supported features matrix.Other approaches in review:
Merge instructions
Merge readiness:
Additional notes
Tone is Preview / private beta; access is gated via Datadog account team. Internal implementation details (enrichment tags, SCI internals, WFA, reducers) are intentionally excluded.