Skip to content

Commit 00db383

Browse files
docs: Clarify Rootless Runtime Requirements
Containers are a wonderful Rube Goldberg machine of Linux internals and configuration [1]. This is especially true for container runtimes that support a "rootless" mode, where spawned processes are subject to constraints and limits that are not present when the runtime executes as root. This change clarifies the existing documentation for launching KinD with a rootless runtime. The guidance is split into logical sections, providing context and justification for each recommended host change. Callouts are made for changes that impact networking components, such as Ingress and Gateway controllers. These generally appear to push default performance guardrails for user containers/processes, or require access to privileged components of a Linux system. Additional host requirements were added based on community review. Some of these are met by running more recent versions of popular Linux distributions, with recommended minimum versions for Ubuntu, Fedora, and Arch Linux. For those running older versions or other distributions, specific instructions were added to enable cgroup v2 and systemd CPU delegation. [1] https://en.wikipedia.org/wiki/Rube_Goldberg_machine Assisted-by: Cursor Signed-off-by: Adam Kaplan <[email protected]> Co-authored-by: Akihiro Suda <[email protected]>
1 parent f20102c commit 00db383

File tree

3 files changed

+196
-33
lines changed

3 files changed

+196
-33
lines changed

site/content/docs/user/ingress.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ Ingress exposes HTTP and HTTPS routes from outside the cluster to services withi
2121
> **NOTE**: You may also want to consider using [Gateway API](https://gateway-api.sigs.k8s.io/) instead of Ingress.
2222
> Gateway API has an [Ingress migration guide](https://gateway-api.sigs.k8s.io/guides/migrating-from-ingress/).
2323
24+
> **WARNING**: If you are using a [rootless container runtime], ensure your host is
25+
> properly configured before creating the KIND cluster. Most Ingress and Gateway controllers will
26+
> not work if these steps are skipped.
27+
2428
### Create Cluster
2529

2630
#### Option 1: LoadBalancer
@@ -139,3 +143,4 @@ curl localhost/bar
139143

140144
[LoadBalancer]: /docs/user/loadbalancer/
141145
[Cloud Provider KIND]: /docs/user/loadbalancer/
146+
[rootless container runtime]: /docs/user/rootless/

site/content/docs/user/quick-start.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,6 +160,9 @@ More usage can be discovered with `kind create cluster --help`.
160160
kind can auto-detect the [docker], [podman], or [nerdctl] installed and choose the available one. If you want to turn off the auto-detect, use the environment variable `KIND_EXPERIMENTAL_PROVIDER=docker`, `KIND_EXPERIMENTAL_PROVIDER=podman` or `KIND_EXPERIMENTAL_PROVIDER=nerdctl` to
161161
select the runtime.
162162

163+
> **NOTE**: podman and nerdctl operate in [rootless mode](/docs/user/rootless) by default. Extra
164+
> setup is needed for KIND clusters to be fully functional.
165+
163166
## Interacting With Your Cluster
164167

165168
After [creating a cluster](#creating-a-cluster), you can use [kubectl][kubectl]
@@ -501,4 +504,4 @@ kind, the Kubernetes cluster itself, etc.
501504
[Private Registries]: /docs/user/private-registries
502505
[customize control plane with kubeadm]: https://kubernetes.io/docs/setup/independent/control-plane-flags/
503506
[access multiple clusters]: https://kubernetes.io/docs/tasks/access-application-cluster/configure-access-multiple-clusters/
504-
[release notes]: https://github.com/kubernetes-sigs/kind/releases
507+
[release notes]: https://github.com/kubernetes-sigs/kind/releases

site/content/docs/user/rootless.md

Lines changed: 187 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -9,57 +9,212 @@ menu:
99
Starting with kind 0.11.0, [Rootless Docker](https://docs.docker.com/go/rootless/), [Rootless Podman](https://github.com/containers/podman/blob/master/docs/tutorials/rootless_tutorial.md) and [Rootless nerdctl](https://github.com/containerd/nerdctl/blob/main/docs/rootless.md) can be used as the node provider of kind.
1010

1111
## Provider requirements
12+
1213
- Docker: 20.10 or later
1314
- Podman: 3.0 or later
1415
- nerdctl: 1.7 or later
1516

1617
## Host requirements
17-
The host needs to be running with cgroup v2.
18-
Make sure that the result of the `docker info` command contains `Cgroup Version: 2`.
19-
If it prints `Cgroup Version: 1`, try adding `GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"` to `/etc/default/grub` and
20-
running `sudo update-grub` to enable cgroup v2.
2118

22-
Also, depending on the host configuration, the following steps might be needed:
19+
### cgroup v2
20+
21+
The host needs to be running with cgroup v2, which is the default for many Linux disributions:
22+
23+
- Ubuntu: 21.10 and later.
24+
- Fedora: 31 and later.
25+
- Arch: April 2021 release and later.
26+
27+
You can verify the cgroup version used by your controller runtime with the following procedure:
28+
29+
- `docker`: Run `docker info` and look for `Cgroup Version: 2` in the output.
30+
- `podman`: Run `podman info` and look for `cgroupVersion: v2` in the output.
31+
- `nerdctl`: Run `nerdctl info` and look for `Cgroup Version: 2` in the output.
32+
33+
If the `info` output prints `Cgroup Version: 1` or equivalent, try the following to enable cgroup v2:
34+
35+
1. In `/etc/default/grub`, add the line `GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"`
36+
2. Run `sudo update-grub` to enable cgroup v2.
37+
38+
Your host will also need to enable [cgroup delegation](https://systemd.io/CGROUP_DELEGATION/) of the `cpu` controller for
39+
user services. This is enabled by default for distributions running `systemd` version 252 and higher.
40+
41+
To enable cgroup delegation for all the controllers, do the following:
42+
43+
1. Check your version of `systemd` by running `systemctl --version`. If the output prints
44+
`systemd 252` or higher, no further action is needed. Example output below from a Fedora host:
45+
46+
```sh
47+
$ systemctl --version
48+
systemd 257 (257.9-2.fc42)
49+
```
50+
51+
2. For systems with older versions of `systemd`, first create the directory
52+
`/etc/systemd/system/[email protected]/` if it is not present.
53+
54+
```sh
55+
sudo mdkir -p /etc/systemd/system/[email protected]/
56+
```
57+
58+
3. Next, create the file `/etc/systemd/system/[email protected]/delegate.conf` with the following content:
59+
60+
```ini
61+
[Service]
62+
Delegate=yes
63+
```
64+
65+
4. Reload systemd for these changes to take effect:
66+
67+
```sh
68+
sudo systemctl daemon-reload
69+
```
70+
71+
5. If using docker, reload the user docker daemon:
72+
73+
```sh
74+
systemctl --user restart docker
75+
```
76+
77+
### Networking
78+
79+
Containers running in rootless mode may not loaded with host-level iptable modules.
80+
This breaks the behavior of most networking components, such as Ingress and Gateway controllers.
81+
82+
To load the iptable modules, do the following:
83+
84+
1. First, use `lsmod` to check which kernel modules are loaded by default for user processes on
85+
your system. Use `grep` to find which iptable modules are loaded:
86+
87+
```sh
88+
lsmod | grep "ip.*table"
89+
```
90+
91+
2. Check the output for the following kernel modules:
92+
- `ip6_tables`
93+
- `ip6table_nat`
94+
- `ip_tables`
95+
- `iptable_nat`
2396

24-
- Create `/etc/systemd/system/[email protected]/delegate.conf` with the following content, and then run `sudo systemctl daemon-reload`:
97+
3. If one or more of the kernel modules above are not present, your system needs to load these at
98+
startup for each process. First, run the following command to add these missing modules:
99+
100+
```sh
101+
sudo tee /etc/modules-load.d/iptables.conf > /dev/null <<'EOF'
102+
ip6_tables
103+
ip6table_nat
104+
ip_tables
105+
iptable_nat
106+
EOF
107+
```
25108
26-
```ini
27-
[Service]
28-
Delegate=yes
29-
```
109+
4. Check that the new module loading configuration is correct. You should see the following output:
30110
31-
(This is not enabled by default because ["the runtime impact of
32-
[delegating the "cpu" controller] is still too
33-
high"](https://lists.fedoraproject.org/archives/list/[email protected]/thread/ZMKLS7SHMRJLJ57NZCYPBAQ3UOYULV65/).
34-
Beware that changing this configuration may affect system
35-
performance.)
111+
```sh
112+
$ cat /etc/modules-load.d/iptables.conf
113+
ip6_tables
114+
ip6table_nat
115+
ip_tables
116+
iptable_nat
117+
```
36118
37-
Please note that:
119+
5. Next, restart the `systemd-modules-load` service to make these changes effective immediately:
38120
39-
- `/etc/systemd/system/[email protected]/` directory needs to be created if not already present on your host
40-
- If using Docker and it was already running when this step was done, a restart is needed for the changes to take
41-
effect
42-
{{< codeFromInline lang="bash" >}}
43-
systemctl --user restart docker
44-
{{< /codeFromInline >}}
121+
```sh
122+
sudo systemctl restart systemd-modules-load.service
123+
```
45124
46-
- Create `/etc/modules-load.d/iptables.conf` with the following content:
125+
Alternatively, restart your system to ensure these changes take effect.
47126
48-
```
49-
ip6_tables
50-
ip6table_nat
51-
ip_tables
52-
iptable_nat
53-
```
127+
### Increase PID Limits
54128
55-
- If using podman, be aware that by default there is a [limit](https://docs.podman.io/en/v4.3/markdown/options/pids-limit.html#pids-limit-limit) to the number of pids that can be created. This can cause problems like nginx workers inside a container not spawning correctly.
56-
- If you want to disable this limit, edit your `containers.conf` file (generally located in `/etc/containers/containers.conf`). Note that this could cause things like pid exhaustion to happen on the host machine. Alternatively, change `0` to your desired new limit:
129+
KIND nodes are represented as individual containers on their hosts. Runtimes such as podman set
130+
default [process id limits](https://docs.podman.io/en/v4.3/markdown/options/pids-limit.html#pids-limit-limit)
131+
that may be too low for the node or for a pod running on the node. The Ingress NGINX Controller is
132+
[particularly susceptible](https://github.com/kubernetes-sigs/kind/issues/3451) to this issue.
133+
134+
To increase the PID limit, do the following:
135+
136+
1. If using podman, edit your `containers.conf` file (generally located in
137+
`/etc/containers/containers.conf` or `~/.config/containers/containers.conf`) to increase the PIDs
138+
limit to a desired value (default 4096 on most systems):
57139
58140
```ini
59141
[containers]
60-
pids_limit = 0
142+
pids_limit = 65536
61143
```
62144
145+
2. Re-recreate the KIND cluster for these changes to take effect:
146+
147+
```sh
148+
kind delete cluster && kind create cluster
149+
```
150+
151+
### Increase inotify Limits
152+
153+
As documented in [known issues](/docs/user/known-issues/#pod-errors-due-to-too-many-open-files), pods may
154+
fail by reaching inotify watch and instance limits. Ingress controllers such as NGINX and Contour
155+
are particularly susceptible to this issue.
156+
157+
To increase the inotify limits, do the following:
158+
159+
1. As root, create a `.conf` file in `/etc/systctl.d` that increases the `fs.inotify` max user settings:
160+
161+
```
162+
fs.inotify.max_user_watches = 524288
163+
fs.inotify.max_user_instances = 512
164+
```
165+
166+
2. Reload `sysctl` for these changes to take effect:
167+
168+
```sh
169+
sudo sysctl --system
170+
```
171+
172+
Alternatively, restart your system for these changes to take effect.
173+
174+
### Allow Binding to Privileged Ports
175+
176+
If you use the `extraPortMappings` method to provide ingress to your KIND cluster, you can allow
177+
the KIND node container to bind to ports 80 and 443 on the host. User containers cannot bind to
178+
ports below 1024 by default as they are considered privileged.
179+
180+
You can avoid this issue by binding the node to a non-privileged host port, such as 8080 or 8443:
181+
182+
```yaml
183+
# kind config.yaml
184+
kind: Cluster
185+
apiVersion: kind.x-k8s.io/v1alpha4
186+
nodes:
187+
- role: control-plane
188+
extraPortMappings:
189+
- containerPort: 80
190+
hostPort: 8080
191+
protocol: TCP
192+
- containerPort: 443
193+
hostPort: 8443
194+
protocol: TCP
195+
```
196+
197+
Note that with this configuration, requests to your cluster ingress will need to add the
198+
appropriate port number. In the example above, HTTP requests must use `localhost:8080` in the URL.
199+
200+
To allow a KIND node to bind to ports 80 and/or 443 on the host, do the following:
201+
202+
1. As root, create a `.conf` file in `/etc/systctl.d` that lowers the privileged port start number:
203+
204+
```
205+
# Allow unprivileged binding to HTTP port 80
206+
# Use 443 if you only need binding to the default HTTPS port
207+
net.ipv4.ip_unprivileged_port_start=80
208+
```
209+
210+
2. Reload `sysctl` for these changes to take effect:
211+
212+
```sh
213+
sudo sysctl --system
214+
```
215+
216+
Alternatively, restart your system for these changes to take effect.
217+
63218
## Restrictions
64219
65220
The restrictions of Rootless Docker apply to kind clusters as well.

0 commit comments

Comments
 (0)