Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Advantages of Ray over Django + Celery + Dask experience #21248

Closed
daniel-acuna opened this issue Dec 23, 2021 · 3 comments
Closed

[RFC] Advantages of Ray over Django + Celery + Dask experience #21248

daniel-acuna opened this issue Dec 23, 2021 · 3 comments
Labels
stale The issue is stale. It will be closed within 7 days unless there are further conversation

Comments

@daniel-acuna
Copy link

Hello all,

I had posted the following comment below on Ray's Slack workspace and was asked to share it in Github as an RFC. My experience is developing APIs with Django + Celery/Jobtastic + Dask (thanks @ericl!). So I am happy to do it. Of course, opinions are my own, but I hope this is close enough to others' experiences to be helpful for the Ray team:

I tried building an API with Django + Celery + Dask, but a couple of things that Ray has solved for me that these technologies didn't have were:

  • Ray can quickly scale to many nodes and control the resources that Actors and Tasks need. For example, some parts of my workflow need exclusive access to a GPU, and the Ray decorators make this relatively easy. Also, Ray essentially solved the issue of serving the services through FastAPI, which I had implemented with Django + Celery. Django is great but overly complicated for this.
  • I love the Celery project, but I am hesitant to build on top of it, given that the main component I am using is Jobtastic (https://policystat.github.io/jobtastic/). Furthermore, I have not seen a lot of new developments in that project, so I'm not sure if it will be available in the future! On the other hand, Ray seems to be much more active, and judging from my previous experience with Spark, I expect it to continue being well-maintained in the future.
  • This next point could be both a pro and a con. I feel like Ray gives me much more control over executing my tasks. Everything is in Python, and I know exactly what's going on. With Django + Celery, I am never so sure about this.
    I love Ray’s ability to spawn/launch new jobs/tasks/actors/workflows inside other jobs/tasks/actors/workflows. For example, I have a pipeline that processes the images inside a PDF. I wanted to build a task that extracts the images and processes each image in parallel. Doing this in Celery proved complicated because it was hard to create dynamic execution graphs. They do have beautiful concepts such as "Chords" and "Chains" (https://docs.celeryproject.org/en/stable/userguide/canvas.html), but they seemed to be somewhat limited. With Ray, building dynamic graphs is trivial.

About Ray vs. Spark. I see them as different things, and I do not think comparisons are fair. But I like what the Ray team has done with Ray Datasets and the ability to store Tensors inside columns and store them as Parquet files (very efficient). I am sure there is a way of doing this in Spark, but not out of the box. Spark has Vectors for dealing with features, but you sometimes need to store tensors and do operations with them (e.g., take the average tensor across rows). This need happens with my pipeline above, which extracts key points (e.g., SIFT features) from images. I like storing image metadata in some columns along with thousands of key points (e.g., SIFT features; a tensor) in another column.

Comments welcome!

@ericl
Copy link
Contributor

ericl commented Dec 23, 2021

X-ref with #21161 (proposing first class task queueing support in Ray)

@stale
Copy link

stale bot commented Apr 23, 2022

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 23, 2022
@stale
Copy link

stale bot commented Aug 13, 2022

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this as completed Aug 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

No branches or pull requests

2 participants