Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace fle_utils.cronograph with a third party (btw I hear that celery contains a lot of vitamins) #2891

Closed
benjaoming opened this issue Jan 25, 2015 · 2 comments

Comments

@benjaoming
Copy link
Contributor

So instead of having a buggy and inferior job scheduler, let's use a truly great third-party app for this!!

I'm thinking about Celery of course :) KA Lite seems to have a lot of background jobs, and this is a fundamental part of the platform that cannot be refactored.

For motivation on why our own implementation should be removed, have a look at the issues it has caused over time:

https://github.com/learningequality/ka-lite/issues?q=cron

How you want to run Celery is configurable. For instance, for dev and testing, there is the CELERY_ALWAYS_EAGER setting.

I would recommend kicking things off with having a pure-Django broker. The amount of jobs currently does not warrant for any super-fast compiled OS-specific broker.

Yet, the Celery worker process has to be started in an OS-specific fashion.

Running on OSX:
https://github.com/celery/celery/tree/3.1/extra/osx/

Running on Windows:
https://www.calazan.com/windows-tip-run-applications-in-the-background-using-task-scheduler/

@jamalex
Copy link
Member

jamalex commented Jan 25, 2015

Interesting idea!

KA Lite seems to have a lot of background jobs

The original version did have a number of things run via background jobs (e.g. the video downloading was queued by the main process and then picked up and run by the cron), but most of this was removed over a year ago, and turned into immediate direct thread spawning. The only thing still being handled by chronograph is the automatic background hourly data sync.

have a look at the issues it has caused over time

I didn't recall us having too many issues with chronograph, at least since very early on, so I did a bit of a survey.

Most of those issues just mention cron in passing (e.g. quoting the log output with Starting the cron server in the background, or comments mentioning/recommending the use of cron), not citing cron as a cause of a problem (#15, #101, #246, #263, #271, #278, #308, #320, #529, #652, #659, #688, #875, #876, #908, #1653, #2025, #2278, #2434).

Most of the others were either related to the old way of doing things, or were about small optimizations to improve the efficiency of what is already a very lightweight library (it's basically just two tables -- jobs and logs -- with a few helper functions around it for scheduling, and tracking status).

I've wanted to use Celery in projects for a long time now, and our central server could definitely be a great place to do that (since we have full control over the deployment environment, and we will be doing more and more job-processing stuff there). For the ka-lite app itself, I hadn't considered celery as an option, since:

  1. It was unclear to me how cross-platform it is. I had been under the impression that it was not pure Python, but looking at the source it actually seems like it might be, right? Maybe I was thinking that because it's often used with RabbitMQ, which is in Erlang. But I guess that's what you were saying about being able to use a pure Django broker, instead. In terms of cross-platform support, we would need to confirm that it works fine (and fast) on both Android and devices like the RPi/Sandisk.
  2. We don't actually use the cron server very much anymore, and chronograph has been working decently. If we can think about how to rework more of the KA Lite app flow using tasks/jobs, then I can see more of a motivation to work on integrating something new. An example of this might be the in-app update process, right @aronasorman? (Delegating the server restart to an independent cron-launched process might help with some of the zombie process and occasional hangups you were getting)

@aronasorman
Copy link
Collaborator

Great idea re: using celery for the upgrade process. I've been thinking of making a separate server to handle the whole software update process to avoid any leftover processes, but offloading that to celery is one good possibility too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants