Fix zombie process race condition in sidecar container#1806
Open
mdzhigarov wants to merge 2 commits intocarvel-dev:developfrom
Open
Fix zombie process race condition in sidecar container#1806mdzhigarov wants to merge 2 commits intocarvel-dev:developfrom
mdzhigarov wants to merge 2 commits intocarvel-dev:developfrom
Conversation
Replace custom reapZombies implementation with tini as PID 1 to eliminate
race condition causing "waitid: no child processes" errors during template
operations.
Problem:
- reapZombies() used syscall.Wait4(-1, ...) which reaps ANY child process
- This interfered with normal parent-child process waiting in CmdExec.Run
- Race condition: reapZombies could reap ytt/vendir/imgpkg processes before
their actual parent (sidecar process) could wait for them
- Result: cmd.Wait() failed with ECHILD ("waitid: no child processes")
Solution:
- Install and use tini as proper PID 1 init system in Dockerfile
- Remove problematic reapZombies function entirely
- tini correctly handles only orphaned processes, not normal children
- Eliminates race condition while maintaining proper zombie cleanup
Changes:
- Dockerfile: Install tini package and set as entrypoint
- sidecarexec.go: Remove reapZombies function and unused imports
- deployment.yml: Add documentation comment about tini configuration
This fixes intermittent failures during PackageRepository reconciliation
and other template-heavy operations under concurrent load.
Fixes: Race condition between zombie reaper and command execution
Made-with: Cursor
Signed-off-by: Marin Dzhigarov <m.dzhigarov@gmail.com>
Made-with: Cursor
Use ytt-specific comment syntax (#!) instead of regular comments (#) to avoid template compilation errors. Signed-off-by: Marin Dzhigarov <m.dzhigarov@gmail.com> Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a race condition in the sidecar container that causes intermittent
"waitid: no child processes"errors during template operations (ytt, vendir, imgpkg).Problem
The custom
reapZombies()function was incorrectly implemented:syscall.Wait4(-1, ...)which reaps any child process, not just orphansCmdExec.Runytt/vendir/imgpkgcompleted,reapZombiescould reap them before their actual parentcmd.Wait()failed withECHILD("waitid: no child processes")Root Cause
The
reapZombiesfunction violated Unix process management conventions by reaping processes that had living parents, instead of only reaping orphaned processes (whose parent died).Solution
reapZombies()function entirely: Eliminates the race condition sourceChanges
tinipackage and set as entrypointreapZombiesfunction and unused importsImpact
Test Plan
ps -efshould showtini -- /kapp-controllerRelated Issues
This addresses the race condition described in the error logs where template operations fail with zombie process errors during high-frequency reconciliation.
Made with Cursor