Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Ray Train XGboost tutorials #50331

Merged
merged 16 commits into from
Feb 12, 2025
Merged

Conversation

crypdick
Copy link
Contributor

@crypdick crypdick commented Feb 8, 2025

Why are these changes needed?

This PR merges the two existing Ray Train XGBoost tutorials, which were largely duplicated, into one.

  • created new distributed-xgboost-lightgbm.ipynb which is a superset of the two existing tutorials
  • ported .rst with transcluded code snippets since it was a tangled nightmare
  • updated pointers in train/examples.yml to this tutorial
  • kept orphaned code in doc_code/preprocessors.py since it is part of our CI

NOTE: I am not sure that I updated the pointers to this tutorial correctly. Please double check my work in examples.yml

Out of scope: merging the third Ray Tune XGBoost tutorial.

Related issue number

#50232 (comment)

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

* created new distributed-xgboost-lightgbm.ipynb which is a superset of the two existing tutorials
* ported .rst with transcluded code snippets since it was a tangled nightmare
* updated pointers in train/examples.yml to this tutorial
* kept orphaned code in `doc_code/preprocessors.py` since it is part of our CI
* see #50232 (comment)

Signed-off-by: Ricardo Decal <[email protected]>
@crypdick crypdick force-pushed the docs/merge-xgboost-tutorials branch from d7e0f32 to 344e63c Compare February 8, 2025 01:15
@pcmoritz pcmoritz force-pushed the docs/merge-xgboost-tutorials branch from 030ffd1 to 1dc757b Compare February 9, 2025 00:31
@pcmoritz
Copy link
Contributor

pcmoritz commented Feb 9, 2025

@crypdick Unfortunately this doesn't work since now none of the links to the .rst that were converted to .ipynb are functional any more (look at the test failures in https://buildkite.com/ray-project/microcheck/builds/11000) -- these are set by the anchor .. _train-gbdt-guide: for example. Either we keep the file in .rst, or we need to figure out how to set the anchor in the .ipynb :)

There was also a small error about a non-existant use case that I pushed a fix for :)

@crypdick
Copy link
Contributor Author

@crypdick Unfortunately this doesn't work since now none of the links to the .rst that were converted to .ipynb are functional any more (look at the test failures in https://buildkite.com/ray-project/microcheck/builds/11000) -- these are set by the anchor .. _train-gbdt-guide: for example. Either we keep the file in .rst, or we need to figure out how to set the anchor in the .ipynb :)

@pcmoritz I am not sure why the link didn't work. For example, the Ray Data examples page links to an ipynb tutorial via the examples.yml links using the same pattern. What am I missing?

@pcmoritz
Copy link
Contributor

The problem is that the tests are failing with this:


[2025-02-08T01:34:27Z] /ray/doc/source/cluster/kubernetes/examples/ml-example.md:14: WARNING: undefined label: 'train-gbdt-guide'
--
  | [2025-02-08T01:34:27Z] /ray/doc/source/cluster/kubernetes/examples/ml-example.md:180: WARNING: undefined label: 'train-gbdt-guide'
  | [2025-02-08T01:34:28Z] /ray/doc/source/cluster/vms/examples/ml-example.md:14: WARNING: undefined label: 'train-gbdt-guide'
  | [2025-02-08T01:34:28Z] /ray/doc/source/cluster/vms/examples/ml-example.md:120: WARNING: undefined label: 'train-gbdt-guide'
  | [2025-02-08T01:34:38Z] /ray/doc/source/ray-overview/use-cases.rst:175: WARNING: unknown document: '/train/examples/xgboost/xgboost_example'
  | [2025-02-08T01:34:50Z] /ray/doc/source/train/more-frameworks.rst:43: WARNING: unknown document: 'distributed-xgboost-lightgbm'


(you should be able to see this in the CI run linked from my comment)

@crypdick
Copy link
Contributor Author

@pcmoritz ok, I've fixed 3 of the error categories.

I'm stuck on the undefined label errors from the missing anchors. The Sphinx cross-referencing docs don't mention ipynb support. However, it is possible to link to arbitrary webpages, would that be a satisfactory work-around?

Alternatively, I found an ugly hack to create anchors by converting the ipynb notebooks, which I don't recommend.

@pcmoritz
Copy link
Contributor

Does it work if you replace the labels train-gbdt-guide with /train/examples/xgboost/distributed-xgboost-lightgbm? Unfortunately using link to arbitrary webpages is not a valid workaround (first, it doesn't check the webpage exists, and more importantly, it doesn't handle versioning of the docs correctly)


[2025-02-10T21:53:31Z] /ray/doc/source/cluster/kubernetes/examples/ml-example.md:14: WARNING: undefined label: 'train-gbdt-guide'
--
  | [2025-02-10T21:53:31Z] /ray/doc/source/cluster/kubernetes/examples/ml-example.md:180: WARNING: undefined label: 'train-gbdt-guide'
  | [2025-02-10T21:53:32Z] /ray/doc/source/cluster/vms/examples/ml-example.md:14: WARNING: undefined label: 'train-gbdt-guide'
  | [2025-02-10T21:53:32Z] /ray/doc/source/cluster/vms/examples/ml-example.md:120: WARNING: undefined label: 'train-gbdt-guide'


@crypdick crypdick requested a review from kevin85421 as a code owner February 11, 2025 01:19
@crypdick
Copy link
Contributor Author

crypdick commented Feb 11, 2025

@pcmoritz now getting WARNING: undefined label: '/train/examples/xgboost/distributed-xgboost-lightgbm' 😭 I feel like there has to be a way

Also, I'm not sure why it says that I requested reviews, all I did was push a new commit

@crypdick crypdick force-pushed the docs/merge-xgboost-tutorials branch 2 times, most recently from 0266526 to a407b0b Compare February 11, 2025 21:09
@crypdick crypdick force-pushed the docs/merge-xgboost-tutorials branch from a407b0b to 86cb495 Compare February 11, 2025 21:11
@crypdick crypdick changed the title Merge XGboost tutorials Merge Ray Train XGboost tutorials Feb 11, 2025
@pcmoritz pcmoritz added the go add ONLY when ready to merge, run all tests label Feb 12, 2025
Copy link
Contributor

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! The only small wrinkle is that now this is displayed twice in the example gallery, maybe we can deduplicate it going forward (but not a big deal):

Screenshot 2025-02-11 at 4 47 54 PM

@pcmoritz pcmoritz merged commit 9c4779f into master Feb 12, 2025
6 checks passed
@pcmoritz pcmoritz deleted the docs/merge-xgboost-tutorials branch February 12, 2025 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants