Skip to content

Increase Kind apiserver event TTL from 1h to 4h#2351

Open
delthas wants to merge 1 commit intodevelopment/2.14from
improvement/ZENKO-5217/event-ttl-ci
Open

Increase Kind apiserver event TTL from 1h to 4h#2351
delthas wants to merge 1 commit intodevelopment/2.14from
improvement/ZENKO-5217/event-ttl-ci

Conversation

@delthas
Copy link
Contributor

@delthas delthas commented Mar 13, 2026

Summary

  • Bumps kube-apiserver --event-ttl from the default 1h to 4h in the Kind CI cluster configuration
  • Adds a ClusterConfiguration kubeadm patch in bootstrap-kind.sh

Context

During the investigation of Azure archive test flakiness (PR #2340, documented in ZENKO-5216), we found that Kubernetes events from the failure window were gone by the time artifacts were collected.

The failing archive tests ran around 06:36, but kind export logs didn't execute until ~08:22 — nearly 2 hours later. With the default 1-hour TTL, all events from the test window (pod scheduling, container restarts, liveness probe failures) had been garbage-collected by the apiserver. This left us unable to determine whether sorbet-fwd or sorbet-azure experienced restarts or probe failures during the critical window.

Increasing the TTL to 4 hours ensures events survive for the full duration of any CI job, making them available in the kind export logs output alongside the Fluent Bit persistent logs added in PR #2350.

Issue: ZENKO-5217

Kubernetes events expire after 1 hour by default. In CI, kind export
logs often runs 1-2 hours after the test window, by which time the
relevant events have already been garbage-collected.

This was a concrete gap during the Azure archive flakiness
investigation (PR #2340): the failing tests ran around 06:36 but kind
export logs didn't execute until ~08:22 — nearly 2 hours later. All
Kubernetes events from the failure window (pod scheduling, restarts,
liveness probe failures, OOM kills) had expired and were unavailable
for diagnosis.

Bump --event-ttl to 4h via a kubeadm ClusterConfiguration patch in
the Kind cluster config, ensuring events survive for the entire
duration of CI jobs.

Issue: ZENKO-5217
@bert-e
Copy link
Contributor

bert-e commented Mar 13, 2026

Hello delthas,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Mar 13, 2026

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • 2 peers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants