Building a Distributed Task Queue in Python
Why not just use Celery/RQ/Huey/TaskTiger?
Unfortunately, WakaTime has been using Celery for almost 10 years now. During that time I’ve experienced many critical bugs, some still open years after being introduced. Celery used to be pretty good, but feature bloat made the project difficult to maintain. Also in my opinion, splitting the code into three separate GitHub repos made the codebase hard to read.
However, the main reason: Celery delayed tasks don’t scale.
If you use Celery delayed tasks, as your website grows eventually you’ll start seeing this error message:
QoS: Disabled: prefetch_count exceeds 65535
When that happens the worker stops processing all tasks, not just delayed ones! As WakaTime grew, we started running into this bug more frequently.
I tried RQ, Huey, and TaskTiger, but they were missing features and processed tasks slower than Celery. A distributed task queue is indispensable for a website like WakaTime, and I was tired of running into bugs. For that reason, I decided to build the simplest distributed task queue possible while still providing all the features required by WakaTime.
Introducing WakaQ
WakaQ is a new Python distributed task queue. Use it to run code in the background so your website stays fast and snappy, and your users stay happy.
WakaQ is simple
It’s only 1,264 lines of code!
$ find . -name '*.py' -not -path "./migrations*" -not -path "./venv*" | xargs wc -l | grep " total" | awk '{print $1}' | numfmt --grouping 1,264
It only took one week from the first line of code until fully replacing Celery at WakaTime. That says something about it’s simplicity.
Each queue is implemented using a Redis list. Delayed tasks get their own queues implemented using Redis sorted sets. Broadcast tasks share a single Redis Pub/Sub queue.
WakaQ has all the necessary features
- Queue priorities
- Delayed tasks (run tasks after a timedelta eta)
- Scheduled cron periodic tasks
- Broadcast tasks (run a task on all workers)
- Task soft and hard timeout limits
- Optionally retry tasks on soft timeouts
- Combats memory leaks by restarting workers when max_mem_percent reached
- Super minimal and maintainable
Features considered out of scope are rate limiting, exclusive locking, storing task results, and task chaining. Those are easy to add in your application’s task code, and you probably want to implement these specific to your app’s needs anyway.
WakaQ is ready to use
WakaQ is still a new project, so use at your own risk. WakaQ currently powers all background tasks in prod for the WakaTime website, including but not limited to:
- sending code stats email reports
- renewing our LetsEncrypt SSL certs
- pre caching dashboards, repo badges, and embeddable charts
- anything else we don’t want holding up the web requests
It’s released under a BSD license, so you can use it in open and closed source projects. If you find any bugs please open an issue, but think twice before requesting new features :-)
Happy coding!
Published on Java Code Geeks with permission by Priyanka Sharma, partner at our JCG program. See the original article here: Building a Distributed Task Queue in Python Opinions expressed by Java Code Geeks contributors are their own. |