[RFC] Advantages of Ray over Django + Celery + Dask experience #21248

daniel-acuna · 2021-12-23T03:21:53Z

Hello all,

I had posted the following comment below on Ray's Slack workspace and was asked to share it in Github as an RFC. My experience is developing APIs with Django + Celery/Jobtastic + Dask (thanks @ericl!). So I am happy to do it. Of course, opinions are my own, but I hope this is close enough to others' experiences to be helpful for the Ray team:

I tried building an API with Django + Celery + Dask, but a couple of things that Ray has solved for me that these technologies didn't have were:

Ray can quickly scale to many nodes and control the resources that Actors and Tasks need. For example, some parts of my workflow need exclusive access to a GPU, and the Ray decorators make this relatively easy. Also, Ray essentially solved the issue of serving the services through FastAPI, which I had implemented with Django + Celery. Django is great but overly complicated for this.
I love the Celery project, but I am hesitant to build on top of it, given that the main component I am using is Jobtastic (https://policystat.github.io/jobtastic/). Furthermore, I have not seen a lot of new developments in that project, so I'm not sure if it will be available in the future! On the other hand, Ray seems to be much more active, and judging from my previous experience with Spark, I expect it to continue being well-maintained in the future.
This next point could be both a pro and a con. I feel like Ray gives me much more control over executing my tasks. Everything is in Python, and I know exactly what's going on. With Django + Celery, I am never so sure about this.
I love Ray’s ability to spawn/launch new jobs/tasks/actors/workflows inside other jobs/tasks/actors/workflows. For example, I have a pipeline that processes the images inside a PDF. I wanted to build a task that extracts the images and processes each image in parallel. Doing this in Celery proved complicated because it was hard to create dynamic execution graphs. They do have beautiful concepts such as "Chords" and "Chains" (https://docs.celeryproject.org/en/stable/userguide/canvas.html), but they seemed to be somewhat limited. With Ray, building dynamic graphs is trivial.

About Ray vs. Spark. I see them as different things, and I do not think comparisons are fair. But I like what the Ray team has done with Ray Datasets and the ability to store Tensors inside columns and store them as Parquet files (very efficient). I am sure there is a way of doing this in Spark, but not out of the box. Spark has Vectors for dealing with features, but you sometimes need to store tensors and do operations with them (e.g., take the average tensor across rows). This need happens with my pipeline above, which extracts key points (e.g., SIFT features) from images. I like storing image metadata in some columns along with thousands of key points (e.g., SIFT features; a tensor) in another column.

Comments welcome!

ericl · 2021-12-23T05:34:29Z

X-ref with #21161 (proposing first class task queueing support in Ray)

stale · 2022-04-23T13:37:23Z

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale · 2022-08-13T03:21:23Z

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 23, 2022

stale bot closed this as completed Aug 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Advantages of Ray over Django + Celery + Dask experience #21248

[RFC] Advantages of Ray over Django + Celery + Dask experience #21248

daniel-acuna commented Dec 23, 2021

ericl commented Dec 23, 2021 •

edited

Loading

stale bot commented Apr 23, 2022

stale bot commented Aug 13, 2022

[RFC] Advantages of Ray over Django + Celery + Dask experience #21248

[RFC] Advantages of Ray over Django + Celery + Dask experience #21248

Comments

daniel-acuna commented Dec 23, 2021

ericl commented Dec 23, 2021 • edited Loading

stale bot commented Apr 23, 2022

stale bot commented Aug 13, 2022

ericl commented Dec 23, 2021 •

edited

Loading