Ray fails to serialize self-reference objects #1234

suquark · 2017-11-20T04:12:25Z

System information

Ray installed from (source or binary): pip
Ray version: 0.2.2
Python version: 3.6.2

Describe the problem

Ray fails to serialize self-reference objects (for example, Graph objects in networkx).

I think it is because ray always tries to use pyarrow first and does not catch pyarrow.lib.ArrowNotImplementedError, see

ray/python/ray/worker.py

Lines 285 to 289 in e0360eb

    
           try: 
        
               self.plasma_client.put(value, pyarrow.plasma.ObjectID( 
        
                   object_id.id()), self.serialization_context) 
        
               break 
        
           except pyarrow.SerializationCallbackError as e:

After catching pyarrow.lib.ArrowNotImplementedError, we should not use use_dict=True as a workaround, because it will cause endless loop. A correct approach may be:

            except (pyarrow.SerializationCallbackError, pyarrow.lib.ArrowNotImplementedError) as e:
                try:
                    if isinstance(e, pyarrow.lib.ArrowNotImplementedError):
                        e.example_object = value
                        raise e  # redirect to use cloudpickle

Source code / logs

class Graph:
    def __init__(self):
        self.g = self

G = Graph()
ray.put(G)  # --> pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum recursion depth. It may contain itself recursively.

# another example

import networkx as nx
G = nx.Graph()
    
G.add_edges_from([(1, 2), (1, 3)])
G.add_node(1)
G.add_edge(1, 2)
G.add_node("spam")  # adds node "spam"
G.add_nodes_from("spam")  # adds 4 nodes: 's', 'p', 'a', 'm'
G.add_edge(3, 'm')
ray.put(G)  # --> pyarrow.lib.ArrowNotImplementedError: This object exceeds the maximum recursion depth. It may contain itself recursively.

@mitar

The text was updated successfully, but these errors were encountered:

robertnishihara · 2017-11-20T05:34:11Z

Can you try ray.register_custom_serializer? The following works for me.

import ray
ray.init()

class Graph:
    def __init__(self):
        self.g = self

ray.register_custom_serializer(Graph, use_pickle=True)

G = Graph()
ray.put(G)

This is closely related to #319 and https://issues.apache.org/jira/browse/ARROW-1382.

A side comment. The original code worked for me in Python 2 because in Python 2 Graph is an old-style class and so we automatically fall back to Pickle anyway I think.

mitar · 2017-11-20T06:02:44Z

Hm, so ideally we would like to serialize networkx graphs. Because they can be quite large, I am not sure if pickling is a good approach.

robertnishihara · 2017-11-20T06:32:14Z

Custom serializers/deserializers can be registered with the same approach. Not sure what the right one would be in this case, but just as a simple example, you could do something like

import numpy as np
import ray

ray.init()

class Graph:
    def __init__(self, big_array):
        self.g = self
        self.big_array = big_array

def custom_graph_serializer(obj):
    return obj.big_array

def custom_graph_deserializer(serialized_obj):
    return Graph(serialized_obj)

ray.register_custom_serializer(Graph,
                               serializer=custom_graph_serializer,
                               deserializer=custom_graph_deserializer)

G = Graph(np.ones(100))
ray.put(G)

edoakes · 2020-03-05T23:11:03Z

Stale - please open new issue if still relevant

## Why are these changes needed? This is part of redis removal project. This PR is going to enable grpc based broadcasting by default. ## Related issue number  #19438 ## Checks

## Why are these changes needed? There's one user who has an issue that one of raylets cannot schedule tasks anymore because `num_worker_not_started_by_job_config_not_exist ` > 0. This PR adds better log messages to figure out if the root cause is the job information is not properly propagated from GCS to raylet through Redis pubsub. ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

## Why are these changes needed? This pin is needed to fix `test_output` on master, which broke when 4.0.0 was released. It may also fix the windows build (unsure). ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

## Why are these changes needed? The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally).  ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Co-authored-by: Alex <[email protected]>

## Why are these changes needed? This PR adds the hiredis dependency for non M1 machines. This removes the `redis < 4.0` pin. Since hiredis doesn't have M1 mac wheels yet, so users there will have extra warning messages in their outputs if they use redis 4.0.  ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Co-authored-by: Alex Wu <[email protected]>

## Why are these changes needed? The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally).  ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Co-authored-by: Alex <[email protected]>

## Why are these changes needed? There's one user who has an issue that one of raylets cannot schedule tasks anymore because `num_worker_not_started_by_job_config_not_exist ` > 0. This PR adds better log messages to figure out if the root cause is the job information is not properly propagated from GCS to raylet through Redis pubsub. ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

## Why are these changes needed? This pin is needed to fix `test_output` on master, which broke when 4.0.0 was released. It may also fix the windows build (unsure). ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

## Why are these changes needed? The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally).  ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Co-authored-by: Alex <[email protected]>

## Why are these changes needed? There's one user who has an issue that one of raylets cannot schedule tasks anymore because `num_worker_not_started_by_job_config_not_exist ` > 0. This PR adds better log messages to figure out if the root cause is the job information is not properly propagated from GCS to raylet through Redis pubsub. ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

## Why are these changes needed? This pin is needed to fix `test_output` on master, which broke when 4.0.0 was released. It may also fix the windows build (unsure). ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

## Why are these changes needed? The change in #20374 was interpreted as a file redirect, not a "greater than" by docker (strangely enough, differently than bash interprets it locally).  ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Co-authored-by: Alex <[email protected]>

…" (#20668) This reverts commit e9132ed.   ## Why are these changes needed? Seems to break Windows build. ``` (07:46:25) ERROR: BUILD.bazel:406:11: Compiling src/ray/common/task/task_spec.cc failed: (Exit 2): cl.exe failed: error executing command ``` <img width="487" alt="Screen Shot 2021-11-23 at 3 09 18 AM" src="https://user-images.githubusercontent.com/18510752/143013973-f157724c-4951-49a9-80c6-158d41aa4295.png"> ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

@mattip

This reverts commit 02f220b.   ## Why are these changes needed? Looks like this commit makes `test_ray_shutdown` way more flaky. cc @mattip for further investigation after revert <img width="760" alt="Screen Shot 2022-05-31 at 11 14 48 PM" src="https://user-images.githubusercontent.com/18510752/171339737-f48e6e90-391a-4235-bfac-a0aa0e563eb7.png"> ## Related issue number  ## Checks - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

#31454) …28)" This reverts commit a0c894f.   ## Why are these changes needed?  ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

)" (ray-project#313… (ray-project#31454) …28)" This reverts commit a0c894f.   ## Why are these changes needed?  ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Andrea Pisoni <[email protected]>

## Why are these changes needed?  ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

## Why are these changes needed? These flags are no longer useful because the migration has been finished. Delete them.  ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

## Why are these changes needed?  ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

#49425)   ## Why are these changes needed? This PR fixes two errors: 1. when the strategy is constant, we try to check if the column is categorical, but if the column does not exists in the dataframe, it fails with a `KeyError` 2. If the preprocessor was unable to compute statistics because all the columns did not have any value, the preprocessor fails with an unintuitive error in the dropna. ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Martin Bomio <[email protected]>

## Why are these changes needed? Reduces CPU overhead (particularly on the proxy). This is less cryptographically secure but should be OK for our use case. App: ```python from ray import serve @serve.deployment( max_ongoing_requests=100, num_replicas=16, ray_actor_options={"num_cpus": 0}, ) class A: def __call__(self): return b"hi" app = A.bind() ``` Benchmark: ``` ab -n 10000 -c 100 http://127.0.0.1:8000/ ``` Before (~780 qps): ``` Concurrency Level: 100 Time taken for tests: 12.747 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 784.47 [#/sec] (mean) Time per request: 127.475 [ms] (mean) Time per request: 1.275 [ms] (mean, across all concurrent requests) Transfer rate: 146.32 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.6 0 21 Processing: 5 127 35.7 127 305 Waiting: 3 125 35.8 126 304 Total: 5 127 35.6 128 306 Percentage of the requests served within a certain time (ms) 50% 128 66% 138 75% 147 80% 153 90% 170 95% 188 98% 210 99% 224 100% 306 (longest request) ``` After (~820 qps): ``` Concurrency Level: 100 Time taken for tests: 12.130 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 824.44 [#/sec] (mean) Time per request: 121.295 [ms] (mean) Time per request: 1.213 [ms] (mean, across all concurrent requests) Transfer rate: 153.78 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.5 0 4 Processing: 6 121 30.1 124 230 Waiting: 4 119 30.2 123 228 Total: 7 121 30.0 124 230 Percentage of the requests served within a certain time (ms) 50% 124 66% 132 75% 138 80% 144 90% 157 95% 167 98% 181 99% 189 100% 230 (longest request) ``` ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Edward Oakes <[email protected]>

ray-project#49425)   ## Why are these changes needed? This PR fixes two errors: 1. when the strategy is constant, we try to check if the column is categorical, but if the column does not exists in the dataframe, it fails with a `KeyError` 2. If the preprocessor was unable to compute statistics because all the columns did not have any value, the preprocessor fails with an unintuitive error in the dropna. ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Martin Bomio <[email protected]>

## Why are these changes needed? Refactor the code so it can be overwritten in the ImageURIPlugin ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(

## Why are these changes needed? Reduces CPU overhead (particularly on the proxy). This is less cryptographically secure but should be OK for our use case. App: ```python from ray import serve @serve.deployment( max_ongoing_requests=100, num_replicas=16, ray_actor_options={"num_cpus": 0}, ) class A: def __call__(self): return b"hi" app = A.bind() ``` Benchmark: ``` ab -n 10000 -c 100 http://127.0.0.1:8000/ ``` Before (~780 qps): ``` Concurrency Level: 100 Time taken for tests: 12.747 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 784.47 [#/sec] (mean) Time per request: 127.475 [ms] (mean) Time per request: 1.275 [ms] (mean, across all concurrent requests) Transfer rate: 146.32 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.6 0 21 Processing: 5 127 35.7 127 305 Waiting: 3 125 35.8 126 304 Total: 5 127 35.6 128 306 Percentage of the requests served within a certain time (ms) 50% 128 66% 138 75% 147 80% 153 90% 170 95% 188 98% 210 99% 224 100% 306 (longest request) ``` After (~820 qps): ``` Concurrency Level: 100 Time taken for tests: 12.130 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 1910000 bytes HTML transferred: 120000 bytes Requests per second: 824.44 [#/sec] (mean) Time per request: 121.295 [ms] (mean) Time per request: 1.213 [ms] (mean, across all concurrent requests) Transfer rate: 153.78 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.5 0 4 Processing: 6 121 30.1 124 230 Waiting: 4 119 30.2 123 228 Total: 7 121 30.0 124 230 Percentage of the requests served within a certain time (ms) 50% 124 66% 132 75% 138 80% 144 90% 157 95% 167 98% 181 99% 189 100% 230 (longest request) ``` ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Edward Oakes <[email protected]> Signed-off-by: Roshan Kathawate <[email protected]>

ray-project#49425)   ## Why are these changes needed? This PR fixes two errors: 1. when the strategy is constant, we try to check if the column is categorical, but if the column does not exists in the dataframe, it fails with a `KeyError` 2. If the preprocessor was unable to compute statistics because all the columns did not have any value, the preprocessor fails with an unintuitive error in the dropna. ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Martin Bomio <[email protected]> Signed-off-by: Roshan Kathawate <[email protected]>

…49612)   ## Why are these changes needed? Refactor the code so it can be overwritten in the ImageURIPlugin ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Roshan Kathawate <[email protected]>

…49612)   ## Why are these changes needed? Refactor the code so it can be overwritten in the ImageURIPlugin ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: lielin.hyl <[email protected]>

## Why are these changes needed? Refactor replica wrapper in replica scheduler such that: - The class that replica scheduler interacts with is `RunningReplica`. - `RunningReplica` holds references to wrapper classes (e.g. `ActorReplicaWrapper`) that handle sending the actual request over the wire to the replicas. ## Related issue number  --------- Signed-off-by: Cindy Zhang <[email protected]>

## Why are these changes needed? In our use case we use Ray Serve with many hundreds/thousands of apps, plus a "router" app that routes traffic to those apps using `DeploymentHandle`s. Right now, that means we have a `LongPollClient` for each `DeploymentHandle` in each router app replica, which could be tens or hundreds of thousands of `LongPollClient`s. This is expensive on both the Serve Controller and on the router app replicas. It can be particularly problematic in resource usage on the Serve Controller - the main thing blocking us from having as many router replicas as we'd like is the stability of the controller. This PR aims to amortize this cost of having so many `LongPollClient`s by going from one-long-poll-client-per-handle to one-long-poll-client-per-process. Each `DeploymentHandle`'s `Router` now registers itself with a shared `LongPollClient` held by a singleton. The actual implementation that I've gone with is a bit clunky because I'm trying to bridge the gap between the current solution and a design that *only* has shared `LongPollClient`s. This could potentially be cleaned up in the future. Right now, each `Router` still gets a dedicated `LongPollClient` that only runs temporarily, until the shared client tells it to stop. Related: #45957 is the same idea but for handle autoscaling metrics pushing.  ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Josh Karpel <[email protected]>

## Why are these changes needed? ray[all] contains too many packages and there is probably no single person / deployment that needs all of them ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Philipp Moritz <[email protected]> Co-authored-by: angelinalg <[email protected]>

## Why are these changes needed? num_rows_per_file makes it seem like there is "exact" semantics, whereas the actual underlying implementation is more like "at least". This updates the parameter name to reflect that, but we should double check whether or not this is the best way to expose this. Closes #45393 ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Richard Liaw <[email protected]> Co-authored-by: Balaji Veeramani <[email protected]>

## Why are these changes needed? * Wraps all filesystem calls with RetryingPyFileSystem * Avoids sprawling redundant changes on Reader and Datasource objects by capturing the access point with iterate_with_retry ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Richard Liaw <[email protected]>

## Why are these changes needed? * Wraps all filesystem calls with RetryingPyFileSystem * Avoids sprawling redundant changes on Reader and Datasource objects by capturing the access point with iterate_with_retry ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Richard Liaw <[email protected]>

## Related issue number Parallel SQL reads support by using MOD/CAT/Custom hashes. Closes #49206  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: jukejian <[email protected]> Signed-off-by: Richard Liaw <[email protected]> Co-authored-by: Richard Liaw <[email protected]>

## Why are these changes needed? num_rows_per_file makes it seem like there is "exact" semantics, whereas the actual underlying implementation is more like "at least". This updates the parameter name to reflect that, but we should double check whether or not this is the best way to expose this. Closes ray-project#45393 ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Richard Liaw <[email protected]> Co-authored-by: Balaji Veeramani <[email protected]> Signed-off-by: Anson Qian <[email protected]>

## Why are these changes needed? Refactor replica wrapper in replica scheduler such that: - The class that replica scheduler interacts with is `RunningReplica`. - `RunningReplica` holds references to wrapper classes (e.g. `ActorReplicaWrapper`) that handle sending the actual request over the wire to the replicas. ## Related issue number  --------- Signed-off-by: Cindy Zhang <[email protected]>

## Why are these changes needed? In our use case we use Ray Serve with many hundreds/thousands of apps, plus a "router" app that routes traffic to those apps using `DeploymentHandle`s. Right now, that means we have a `LongPollClient` for each `DeploymentHandle` in each router app replica, which could be tens or hundreds of thousands of `LongPollClient`s. This is expensive on both the Serve Controller and on the router app replicas. It can be particularly problematic in resource usage on the Serve Controller - the main thing blocking us from having as many router replicas as we'd like is the stability of the controller. This PR aims to amortize this cost of having so many `LongPollClient`s by going from one-long-poll-client-per-handle to one-long-poll-client-per-process. Each `DeploymentHandle`'s `Router` now registers itself with a shared `LongPollClient` held by a singleton. The actual implementation that I've gone with is a bit clunky because I'm trying to bridge the gap between the current solution and a design that *only* has shared `LongPollClient`s. This could potentially be cleaned up in the future. Right now, each `Router` still gets a dedicated `LongPollClient` that only runs temporarily, until the shared client tells it to stop. Related: #45957 is the same idea but for handle autoscaling metrics pushing.  ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Josh Karpel <[email protected]>

## Why are these changes needed? ray[all] contains too many packages and there is probably no single person / deployment that needs all of them ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Philipp Moritz <[email protected]> Co-authored-by: angelinalg <[email protected]>

## Why are these changes needed? num_rows_per_file makes it seem like there is "exact" semantics, whereas the actual underlying implementation is more like "at least". This updates the parameter name to reflect that, but we should double check whether or not this is the best way to expose this. Closes #45393 ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Richard Liaw <[email protected]> Co-authored-by: Balaji Veeramani <[email protected]>

## Why are these changes needed? * Wraps all filesystem calls with RetryingPyFileSystem * Avoids sprawling redundant changes on Reader and Datasource objects by capturing the access point with iterate_with_retry ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Richard Liaw <[email protected]>

…able better UV support (#50160)   ## Why are these changes needed? This implements a very simple runtime environment plugin that allows e.g. using the `uv run` command for dependency handling (but could also be useful for wrapping the worker command e.g. with a profiler or debugger). **Very simple example:** ``` uv run --with emoji test.py ``` with ```Python import ray ray.init(runtime_env={"py_executable": "uv run --with emoji"}) @ray.remote def f(): import emoji return emoji.emojize("Ray rocks :thumbs_up:") print(ray.get(f.remote())) ``` **Slightly more complex example** with `pyproject.toml` in the working_dir (see https://docs.astral.sh/uv/guides/scripts): ``` uv run test.py ``` with pyproject.toml: ```toml [project] name = "test" version = "0.1" dependencies = [ "rich", "emoji", "ray", # had to do "ray @ file:///tmp/ray-2.41.0-cp312-cp312-macosx_11_0_arm64.whl" here since this is not released yet ] ``` text.py ```Python import ray import time from rich.progress import track ray.init(runtime_env={"working_dir": ".", "py_executable": "uv run --isolated"}) @ray.remote def f(): import emoji return emoji.emojize("Ray rocks :thumbs_up:") for i in track(range(20), description="For example:"): print(ray.get(f.remote())) time.sleep(0.05) ``` And the dependencies can also be locked with `uv lock`. ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Philipp Moritz <[email protected]> Co-authored-by: pcmoritz <[email protected]> Co-authored-by: angelinalg <[email protected]>

## Why are these changes needed? Refresher for #50022, but on a separate page and a bit more holistic. It's not tightly integrated into the other pages yet but I will do a revision of quickstart/overview/data.rst pages. ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Richard Liaw <[email protected]> Co-authored-by: Balaji Veeramani <[email protected]>

## Why are these changes needed?  ## Related issue number  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Tanmay Chimurkar <[email protected]> Co-authored-by: Hongpeng Guo <[email protected]>

Verified fixes locally: <img width="1668" alt="Screenshot 2025-02-04 at 5 35 20 PM" src="https://github.com/user-attachments/assets/4f776122-5ef0-466c-b668-f5bf49c4c68c" /> <img width="898" alt="Screenshot 2025-02-04 at 5 35 10 PM" src="https://github.com/user-attachments/assets/3c214fb4-820a-4c87-9fbb-6dfd1d4aeea9" /> <img width="1670" alt="Screenshot 2025-02-04 at 5 32 22 PM" src="https://github.com/user-attachments/assets/a3706aa9-070b-4ab8-b1eb-a7756ab8e35b" /> <img width="621" alt="Screenshot 2025-02-04 at 5 32 15 PM" src="https://github.com/user-attachments/assets/34d8ae19-d893-497a-a024-3cf24a1bdaff" /> <img width="1665" alt="Screenshot 2025-02-04 at 5 31 56 PM" src="https://github.com/user-attachments/assets/4b56b779-f5c8-46d8-83b0-6084c297c392" /> <img width="402" alt="Screenshot 2025-02-04 at 5 31 49 PM" src="https://github.com/user-attachments/assets/ee0c7234-9c9a-4055-b61f-c3222484189b" />   ## Why are these changes needed? Markdown URLs with "&" incorrectly rendered to "&amp", which is breaking utm tracking e.g. Clicking on "Run quickstart" on https://docs.ray.io/en/latest/data/examples/pytorch_resnet_batch_prediction.html renders "https://www.anyscale.com/ray-on-anyscale?utm_source=ray_docs&utm_medium=docs&utm_campaign=image-classification-batch-inference-with-pytorch&redirectTo=/v2/template-preview/ray-data-batch-image-classification" ## Related issue number  ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Chris Zhang <[email protected]>

## Why are these changes needed?  Avoid redirecting to empty docs ## Related issue number  Closes #49335 ## Checks - [X] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Vincent <[email protected]> Signed-off-by: V <[email protected]> Co-authored-by: angelinalg <[email protected]>

…#50192)   ## Why are these changes needed?  https://docs.ray.io/en/latest/serve/tutorials/vllm-example.html - Currently doesn't work out of the box for the latest vllm versions. ## Related issue number N/A  ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [x] This PR is not tested :( --------- Signed-off-by: Eric Tang <[email protected]>

## Why are these changes needed? add llm serving skeleton public api to ray.serve.llm and links to doc ## Related issue number  --------- Signed-off-by: Gene Su <[email protected]>

…ple to example gallery (#49791)   ## Why are these changes needed?  [Data-Juicer](https://github.com/modelscope/data-juicer) is a one-stop system to process text and multimodal data for and with foundation models (typically LLMs). With dedicated integration with Ray, it supports distributed large-scale data cleaning, filtering, synthesis, and TB-level deduplication. We are eager to share the insights gained from our work and foster future collaboration with the Ray team and the Ray user community~ ## Related issue number  N/A ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: lielin.hyl <[email protected]> Signed-off-by: Yilun Huang <[email protected]> Co-authored-by: angelinalg <[email protected]> Co-authored-by: Philipp Moritz <[email protected]>

edoakes closed this as completed Mar 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray fails to serialize self-reference objects #1234

Ray fails to serialize self-reference objects #1234

suquark commented Nov 20, 2017

robertnishihara commented Nov 20, 2017

mitar commented Nov 20, 2017

robertnishihara commented Nov 20, 2017

edoakes commented Mar 5, 2020

Ray fails to serialize self-reference objects #1234

Ray fails to serialize self-reference objects #1234

Comments

suquark commented Nov 20, 2017

System information

Describe the problem

Source code / logs

robertnishihara commented Nov 20, 2017

mitar commented Nov 20, 2017

robertnishihara commented Nov 20, 2017

edoakes commented Mar 5, 2020