-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is there an existing issue for this?
- I have searched the existing issues
Describe the bug
We implemented a worker-cycling extension on one of our projects to prevent OOM issues. The mechanism is pretty simple: we launch a thread after the server starts that loops through transient worker processes and restarts them one by one.
On random occasions, the restart call will trigger an error and the worker process will be terminated, but it won't come back up. This is the exception:
Traceback (most recent call last):
File "/Users/jgarrone/.pyenv/versions/3.11.11/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/Users/jgarrone/.pyenv/versions/3.11.11/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/Users/jgarrone/PycharmProjects/python-playground/sanic_app/main_restart.py", line 14, in cycle_workers
app.manager.restart([p_name])
File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/manager.py", line 244, in restart
self.restarter.restart(
File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/restarter.py", line 25, in restart
restarted = self._restart_transient(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/restarter.py", line 56, in _restart_transient
self._restart_process(process, restart_order, **kwargs)
File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/restarter.py", line 90, in _restart_process
process.restart(restart_order=restart_order, **kwargs)
File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/process.py", line 127, in restart
self.spawn()
File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/process.py", line 150, in spawn
raise Exception("Cannot spawn a worker process until it is idle.")
Exception: Cannot spawn a worker process until it is idle.
We think that the issue is that the WorkerManager._sync_states method runs after the process is terminated, but before it is re-spawned, and seeing that the process is not alive, sets the state to "COMPLETED", which causes the exception when WorkerProcess.spawn gets called on the WorkerProcess.restart method.
Code snippet
Minimal app code to reproduce the behavior:
import threading
import time
from sanic import Sanic, response
app = Sanic("MyApp")
def cycle_workers():
while 1:
time.sleep(30)
processes_names = [p.name for p in app.manager.processes if p.restartable]
for p_name in processes_names:
app.manager.restart([p_name])
time.sleep(10)
@app.main_process_ready
def start(*args):
del args
threading.Thread(target=cycle_workers, daemon=True).start()
async def hello_world(request, *args, **kwargs):
return response.text('Hello World!')
app.add_route(hello_world, '/', methods=['GET', 'POST'])
if __name__ == '__main__':
app.run(port=8080, workers=3)Running that code will eventually hit the race condition. To force it, a time.sleep(0.2) can be added on the WorkerProcess.restart method, after the terminate but before the spawn.
Expected Behavior
The restart method should work without issues.
How do you run Sanic?
Sanic CLI
Operating System
Linux
Sanic Version
23.12.2
Additional context
No response