Skip to content

Restarting worker processes can cause a race condition #3055

@javierdialpad

Description

@javierdialpad

Is there an existing issue for this?

  • I have searched the existing issues

Describe the bug

We implemented a worker-cycling extension on one of our projects to prevent OOM issues. The mechanism is pretty simple: we launch a thread after the server starts that loops through transient worker processes and restarts them one by one.

On random occasions, the restart call will trigger an error and the worker process will be terminated, but it won't come back up. This is the exception:

Traceback (most recent call last):
  File "/Users/jgarrone/.pyenv/versions/3.11.11/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
  File "/Users/jgarrone/.pyenv/versions/3.11.11/lib/python3.11/threading.py", line 982, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/jgarrone/PycharmProjects/python-playground/sanic_app/main_restart.py", line 14, in cycle_workers
    app.manager.restart([p_name])
  File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/manager.py", line 244, in restart
    self.restarter.restart(
  File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/restarter.py", line 25, in restart
    restarted = self._restart_transient(
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/restarter.py", line 56, in _restart_transient
    self._restart_process(process, restart_order, **kwargs)
  File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/restarter.py", line 90, in _restart_process
    process.restart(restart_order=restart_order, **kwargs)
  File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/process.py", line 127, in restart
    self.spawn()
  File "/Users/jgarrone/.virtualenvs/python-playground-311/lib/python3.11/site-packages/sanic/worker/process.py", line 150, in spawn
    raise Exception("Cannot spawn a worker process until it is idle.")
Exception: Cannot spawn a worker process until it is idle.

We think that the issue is that the WorkerManager._sync_states method runs after the process is terminated, but before it is re-spawned, and seeing that the process is not alive, sets the state to "COMPLETED", which causes the exception when WorkerProcess.spawn gets called on the WorkerProcess.restart method.

Code snippet

Minimal app code to reproduce the behavior:

import threading
import time

from sanic import Sanic, response

app = Sanic("MyApp")


def cycle_workers():
  while 1:
    time.sleep(30)
    processes_names = [p.name for p in app.manager.processes if p.restartable]
    for p_name in processes_names:
      app.manager.restart([p_name])
      time.sleep(10)


@app.main_process_ready
def start(*args):
  del args
  threading.Thread(target=cycle_workers, daemon=True).start()


async def hello_world(request, *args, **kwargs):
  return response.text('Hello World!')


app.add_route(hello_world, '/', methods=['GET', 'POST'])

if __name__ == '__main__':
  app.run(port=8080, workers=3)

Running that code will eventually hit the race condition. To force it, a time.sleep(0.2) can be added on the WorkerProcess.restart method, after the terminate but before the spawn.

Expected Behavior

The restart method should work without issues.

How do you run Sanic?

Sanic CLI

Operating System

Linux

Sanic Version

23.12.2

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions