Airflow Scheduler restart results in ObjectDeletedError #30817
Replies: 2 comments
-
|
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval. |
Beta Was this translation helpful? Give feedback.
-
|
First of all, you should gracefully restart Airlfow components when you restart them. Shutting down any application abruptly might always lead to inconsistencies and problem and you should avoid doing it. In the rare even of hardware failure it might happen that things got broken and yes in this case it might happen some manual intervention will be required. Airflow is not 99.999 availablity kind of software. It's developed by volunteers and it cannot reallyy provide that high of an availablity and self -maintenance. Usually adding every But not always. you don't have "always" recovery even for multi-thousand-dollars-licenced software where you also pay a lot for support of it and making sure it can get really high uptime and resiliance to almost any failure. I think trimmign down the expectations and spending some time no gracefully managing your deployment is a good idea. But I wil convert it into a discussion and leave it open here. If you do have an easily reproducible case where gracefully managed airflow produces irrecoverable application state consistently - feel free to open a new issue with reproducible scenario. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Apache Airflow version
2.5.3
What happened
Airflow Scheduler started to have issue when being restarted (either manually or forcefuly) - some task instances are stuck in running/queued state after restart and Scheduler somehow misses reference to them (or fails to readopt them) resulting in critical error about TaskInstance missing.
Error requires manual intervention into airflow database (Setting stuck tasks manually to failed state)
What you think should happen instead
Scheduler should properly shutdown gracefuly in given time and properly restart afterward without raising ObjectDeletedError
How to reproduce
Restart airflow-scheduler/redeploy whole airflow while tasks are running (are being processed by Scheduler/Workers)
We encounter issue with every restart/redeploy. Not sure if reproducible outside our system
Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
apache-airflow-providers-celery==3.1.0
apache-airflow-providers-cncf-kubernetes==5.2.2
apache-airflow-providers-common-sql==1.3.4
apache-airflow-providers-docker==3.5.1
apache-airflow-providers-elasticsearch==4.4.0
apache-airflow-providers-ftp==3.3.1
apache-airflow-providers-google==9.0.0
apache-airflow-providers-grpc==3.1.0
apache-airflow-providers-hashicorp==3.3.0
apache-airflow-providers-http==4.2.0
apache-airflow-providers-imap==3.1.1
apache-airflow-providers-mysql==4.0.2
apache-airflow-providers-odbc==3.2.1
apache-airflow-providers-postgres==5.4.0
apache-airflow-providers-redis==3.1.0
apache-airflow-providers-sendgrid==3.1.0
apache-airflow-providers-sftp==4.2.4
apache-airflow-providers-slack==7.2.0
apache-airflow-providers-snowflake==4.0.4
apache-airflow-providers-sqlite==3.3.1
apache-airflow-providers-ssh==3.5.0
Deployment
Other 3rd-party Helm chart
Deployment details
Kubernetes versions:
Deployment via: https://github.com/airflow-helm/charts
Anything else
Scheduler error log:
Custom Helm Values:
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions