- Newest
- Most votes
- Most comments
When the celery process on the worker is forked every object in memory at that point is copied. Sessions and Connections in SQLAlchemy are not thread-safe. So, even with NullPool, when we initialize a worker with celery.worker_autoscale = 20,20
(the default for a large worker) for example, it actually is making 20 copies of the same Session object right away. If those objects create initial connections, those connections will be copied and there will be a race condition where one process releases the connection and postgres discards it, but the other worker still has a reference to it leading to sqlalchemy.exc.OperationalError
. I suspect you might be observing such errors in the logs, kindly verify this. By setting celery.worker_autoscale = 20,0
no processes are initially created, and as such open connections are not copied (or at least less likely to be) because processes are spun up on-demand, and not all at once (the 0 specifies the initial minimum pool, and the 20 is the maximum).
So I think, we can try to set the celery.worker_autoscale
to x,0
( based on the environment class ) where 0 means that no session copies are created and this could help in getting the task executed without getting into Database connection failures. I would recommend to test this out and see how it goes at least on one environment where you are seeing frequent failures. Please let me know your thoughts on this.
Relevant content
- asked 2 years ago
- asked 2 years ago
- asked 6 months ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated a month ago
- AWS OFFICIALUpdated 9 months ago
- AWS OFFICIALUpdated a month ago