Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: cache nltk models into the docker image #4118

Merged
merged 6 commits into from
Feb 16, 2023

Conversation

mayankjobanputra
Copy link
Contributor

@mayankjobanputra mayankjobanputra commented Feb 9, 2023

Related Issues

Proposed Changes:

Separated NLTK caching from model caching

How did you test it?

Manually tested.

root@719f7d36f1e4:~/nltk_data/tokenizers# python
Python 3.10.8 (main, Nov  4 2022, 13:48:29) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.data.find("tokenizers/punkt")
FileSystemPathPointer('/root/nltk_data/tokenizers/punkt/PY3')

Notes for the reviewer

Some customers who run Haystack into their production require proxy servers to be set up just to download these models, hence this change.

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added tests that demonstrate the correct behavior of the change
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@mayankjobanputra mayankjobanputra requested a review from a team as a code owner February 9, 2023 14:59
@mayankjobanputra mayankjobanputra requested review from sjrl and removed request for a team February 9, 2023 14:59
@mayankjobanputra mayankjobanputra merged commit d27f372 into main Feb 16, 2023
@mayankjobanputra mayankjobanputra deleted the mayank/cache_nltk branch February 16, 2023 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cache NLTK punkt models for docker images
2 participants