Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] support for read_sql to handle tasks concurrently. #49424

Merged
merged 25 commits into from
Jan 30, 2025

Conversation

Jay-ju
Copy link
Contributor

@Jay-ju Jay-ju commented Dec 24, 2024

Why are these changes needed?

For more details, you can refer to the discussion in the corresponding issue. #49206

Related issue number

Closes #49206

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@Jay-ju Jay-ju requested a review from a team as a code owner December 24, 2024 11:25
@Jay-ju Jay-ju force-pushed the fix_read_sql_parallelism branch 2 times, most recently from 9507221 to 4126c25 Compare December 25, 2024 02:52
@Jay-ju Jay-ju force-pushed the fix_read_sql_parallelism branch from 67d65fd to ef0feee Compare December 25, 2024 04:27
@jcotant1 jcotant1 added the data Ray Data-related issues label Dec 27, 2024
@Jay-ju
Copy link
Contributor Author

Jay-ju commented Jan 2, 2025

@bveeramani @richardliaw can you help me review this PR

@richardliaw
Copy link
Contributor

richardliaw commented Jan 2, 2025 via email

@Jay-ju
Copy link
Contributor Author

Jay-ju commented Jan 3, 2025

hi, i will take this but currently am traveling - will get back to you after 1/10 - thanks!

On Thu, Jan 2, 2025 at 6:06 PM jay @.> wrote: @bveeramani https://github.com/bveeramani @richardliaw https://github.com/richardliaw can you help me review this PR — Reply to this email directly, view it on GitHub <#49424 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCRZZKP2HJDZ34HW5SU43L2IT6SHAVCNFSM6AAAAABUESLI7KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRXGQ2TSMRVGU . You are receiving this because you were mentioned.Message ID: @.>

get, tks, Wish you a pleasant holiday.

@richardliaw richardliaw self-assigned this Jan 14, 2025
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
@richardliaw
Copy link
Contributor

Hi, I created a PR on your fork to discuss some changes - https://github.com/Jay-ju/ray/pull/1/files

richardliaw and others added 2 commits January 22, 2025 16:52
Signed-off-by: Richard Liaw <[email protected]>
Cleanup `fix_read_sql_parallelism`
@richardliaw richardliaw added the go add ONLY when ready to merge, run all tests label Jan 24, 2025
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
richardliaw and others added 5 commits January 24, 2025 15:05
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
@Jay-ju Jay-ju force-pushed the fix_read_sql_parallelism branch from e4b140f to 04d742f Compare January 25, 2025 02:16
Comment on lines 403 to 406

if len(read_tasks) < parallelism:
parallelism = len(read_tasks)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we don't add this? I think it might be causing some tests to fail:


[2025-01-28T19:29:11Z] FAILED python/ray/data/tests/test_splitblocks.py::test_small_file_split - ass...
--
  | [2025-01-28T19:29:11Z] FAILED python/ray/data/tests/test_splitblocks.py::test_large_file_additional_split


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, good catch, let me try

Signed-off-by: Richard Liaw <[email protected]>
@richardliaw richardliaw merged commit 668e5b4 into ray-project:master Jan 30, 2025
5 checks passed
xsuler pushed a commit to antgroup/ant-ray that referenced this pull request Mar 4, 2025
…t#49424)

## Related issue number

Parallel SQL reads support by using MOD/CAT/Custom hashes.
Closes  ray-project#49206

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: jukejian <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Co-authored-by: Richard Liaw <[email protected]>
xsuler pushed a commit to antgroup/ant-ray that referenced this pull request Mar 4, 2025
…t#49424)

## Related issue number

Parallel SQL reads support by using MOD/CAT/Custom hashes.
Closes  ray-project#49206

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: jukejian <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Co-authored-by: Richard Liaw <[email protected]>
park12sj pushed a commit to park12sj/ray that referenced this pull request Mar 18, 2025
…t#49424)

## Related issue number

Parallel SQL reads support by using MOD/CAT/Custom hashes.
Closes  ray-project#49206

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: jukejian <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Co-authored-by: Richard Liaw <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Data] support parallelism for `read_sql
5 participants