Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed URLs and fixed ray download error #744

Merged
merged 2 commits into from
Oct 31, 2024
Merged

Conversation

sujee
Copy link
Contributor

@sujee sujee commented Oct 25, 2024

Why are these changes needed?

  • fixed image and colab urls to point to main repo
  • fixed model download error in pdf2parquet in ray version

Related issue number (if any).

- fixed image and colab urls to point to main repo
- fixed model download error in pdf2parquet in ray version

Signed-off-by: Sujee Maniyam <[email protected]>
@shahrokhDaijavad shahrokhDaijavad self-requested a review October 25, 2024 18:13
@shahrokhDaijavad
Copy link
Member

@sujee I tested this today. The local versions of both Python and Ray ran successfully. The Python version on Colab is fine, but the Ray version on Colab has the following problem:
Coming out of the pdf2parquet in Step 3.2 and inspecting the output in Step 3.3, I can only see mars.pdf and earth.pdf is missing! Does it mean that earth.pdf was not converted to pq? Please look into this.

@shahrokhDaijavad
Copy link
Member

shahrokhDaijavad commented Oct 30, 2024

@sujee Thanks for your help in debugging the problem! If I click on "Open with Colab" from the notebook in the repo, it takes the old version of the the notebook from the dev branch. In order to test the new version, I had to open Sujee's fork directly from the Colab [https://colab.research.google.com/github/sujee/data-prep-kit/blob/intro-example2/examples/notebooks/intro/dpk_intro_1_ray.ipynb] and then I was able to run the Ray version in Colab successfully.

Copy link
Member

@shahrokhDaijavad shahrokhDaijavad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I have tested both notebooks locally and on Google Colab.

Copy link
Collaborator

@touma-I touma-I left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review comment and decide to act on/ or ignore comments . Thanks.

" !wget -O 'input/solar-system/earth.pdf' 'https://raw.githubusercontent.com/sujee/data-prep-kit/intro-example1/examples/notebooks/intro/input/solar-system/earth.pdf'\n",
" !wget -O 'input/solar-system/mars.pdf' 'https://raw.githubusercontent.com/sujee/data-prep-kit/intro-example1/examples/notebooks/intro/input/solar-system/mars.pdf'\n",
" !wget -O 'my_utils.py' 'https://raw.githubusercontent.com/sujee/data-prep-kit/intro-example1/examples/notebooks/intro/my_utils.py'"
" !wget -O 'input/solar-system/earth.pdf' 'https://raw.githubusercontent.com/IBM/data-prep-kit/dev/examples/notebooks/intro/input/solar-system/earth.pdf'\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How were these files added ? Can anyone use this as a repo for storing/sharing documents ?

" !wget -O 'my_utils.py' 'https://raw.githubusercontent.com/sujee/data-prep-kit/intro-example1/examples/notebooks/intro/my_utils.py'"
" !wget -O 'input/solar-system/earth.pdf' 'https://raw.githubusercontent.com/IBM/data-prep-kit/dev/examples/notebooks/intro/input/solar-system/earth.pdf'\n",
" !wget -O 'input/solar-system/mars.pdf' 'https://raw.githubusercontent.com/IBM/data-prep-kit/dev/examples/notebooks/intro/input/solar-system/mars.pdf'\n",
" !wget -O 'my_utils.py' 'https://raw.githubusercontent.com/IBM/data-prep-kit/dev/examples/notebooks/intro/my_utils.py'"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? Why isn't that in the git repo? What am I missing ?

Copy link
Member

@shahrokhDaijavad shahrokhDaijavad Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .py file is in the repo, but to run on Google Colab, Sujee puts it in raw.githubusercontent.com. @sujee Can we take this file from the repo itself, as: https://github.com/IBM/data-prep-kit/blob/dev/examples/notebooks/intro/my_utils.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, on colab, we need to download any files besides the notebook.
These are being downloaded from the official repo location

@@ -504,7 +504,8 @@
" \"data_files_to_use\": ast.literal_eval(\"['.pdf']\"),\n",
" # orchestrator\n",
" \"runtime_worker_options\": ParamsUtils.convert_to_ast(worker_options),\n",
" \"runtime_num_workers\": MY_CONFIG.RAY_RUNTIME_WORKERS,\n",
" # \"runtime_num_workers\": MY_CONFIG.RAY_RUNTIME_WORKERS,\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please decide on using 1 or MY_CONFIG.RAY_RUNTIME_WORKERS and remove commented line. Thanks

Signed-off-by: Sujee Maniyam <[email protected]>
@touma-I touma-I merged commit 0c4a0be into IBM:dev Oct 31, 2024
1 check passed
@sujee sujee deleted the intro-example2 branch November 5, 2024 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants