Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KerasHub has mandatory dependency on tensorflow-text but it isn't mandatory #2101

Open
jamesmyatt opened this issue Feb 14, 2025 · 11 comments · Fixed by #2103
Open

KerasHub has mandatory dependency on tensorflow-text but it isn't mandatory #2101

jamesmyatt opened this issue Feb 14, 2025 · 11 comments · Fixed by #2103
Assignees

Comments

@jamesmyatt
Copy link

jamesmyatt commented Feb 14, 2025

Can tensorflow-text be removed as a mandatory dependency, please? e.g. move it to an optional dependency group, e.g. "nlp"

As far as I can tell, this dependency makes it impossible to either:

  • have a tensorflow-free environment
  • install KerasHub at all on Windows

Both of these seem to be contrary to Keras 3's mission, especially if you only want the CV parts of KerasHub.

Furthermore, the NoTensorflow integration tests show that it's acceptable to manually uninstall tensorflow and tensorflow-text after installing keras-hub. But it's not possible to never install them in the first place.

I think this was tolerable when this package was KerasNLP (e.g. #1585, keras-team/keras#19542), but now that it's KerasHub, it's a serious issue.

@jamesmyatt jamesmyatt changed the title KerasHub has mandatory dependency on tensorflow-text KerasHub has mandatory dependency on tensorflow-text but it isn't mandatory Feb 14, 2025
@Lundez
Copy link

Lundez commented Feb 17, 2025

+1 agree

Can't install keras-hub when I have enforced tensorflow<2.18 on MacOS as tensorflow-text don't support that.
It's quite annoying as I'm using keras-hub for its CV use-case.

@abheesht17
Copy link
Collaborator

Thanks for bringing this up!

@jamesmyatt
Copy link
Author

Thanks

@mattdangerw
Copy link
Member

mattdangerw commented Feb 27, 2025

Shoot! I think we actually might need to roll this back. tensorflow and tensorflow-text are indeed optional at import time, but really just as a power user feature. No task will work without tensorflow installed because we currently use tf.data for all preprocessing in the library. The no-tensorflow approach only works if you want to do all preprocessing and task setup yourself (very valid to do, but not our biggest usage path).

On a clean system, if you run pip install keras-hub and then the following...

import keras_hub
classifier = keras_hub.models.ImageClassifier.from_preset(
    "resnet_50_imagenet",
)

You would now get an error saying tensorflow is required (when trying to create image preprocessing layers). That's a breaking change I don't think we want! And even more confusing that the fix is to append "nlp" to the install line.

@mattdangerw mattdangerw reopened this Feb 27, 2025
@mattdangerw
Copy link
Member

Instead I think we need the following for now.

  1. pip install keras-hub should pull in tensorflow-text and tensorflow for now. We want a usable experience out of the box.
  2. We can exclude tensorflow-text on the windows platform since tensorflow dropped support. Of course that means keras-hub is barely usable on native windows, but our recommend path for windows development is WSL.
  3. For power users that know what they are doing and do not want the task API, the no tf approach will require pip install keras-hub --no-deps.

The better solution will need to be one of the following.

  1. If https://peps.python.org/pep-0771/ gets approved, then pip install keras-hub can keep giving a usable out of box experience, and pip install keras-hub[base] can give the pared down installation.
  2. If not, we could consider a separate package, so pip install keras-hub is what most people use, and pip install keras-hub-base gives you the pared down list of requirements.
  3. Finally, and ideally, we could ditch our dependency on tensorflow for preprocessing. We would love to do this honestly, and then tensorflow could really be optional for the average user! However this is a larger undertaking.

I know this is less than ideal, but hopefully we can start making progress towards making tf.data actually optional, and till then use --no-deps as work around.

@jamesmyatt
Copy link
Author

jamesmyatt commented Feb 27, 2025

This disappointing. There really needs to be a version of KerasHub that doesn't have a mandatory dependency on tensorflow, for either of the reasons listed above.

Even on linux, if KerasHub has a mandatory dependency on tensorflow, then it negates any the benefit of the multi-backend capability of Keras 3. For example, do you really want to spend time coordinating which versions of tensorflow and pytorch can be installed in the same venv with the same version of cuda/nocuda and all of the other dependencies?

One interim question is whether it will work with tensorflow "core" as a mandatory (linux) dependency and tensorflow-text as an optional "nlp" dependency, if the main issue is tf.data? At least that avoids the extra complexity of the tensorflow-text project.

KerasHub (as successor to KerasCV and KerasNLP) is advertised as a replacement for the discontinued tf.addons project, and that has to be taken into account too. It's not just a Keras wrapper for third-party models. At least as I understand it.

@Lundez
Copy link

Lundez commented Feb 27, 2025

Agreed, this too me is a big problem as I'd like to use another backend for my CV project... Which is one of the major "selling-points" of Keras 3.

@jamesmyatt
Copy link
Author

KerasHub doesn't necessarily need to be totally backend agnostic, just needs to respect to the backend selected for Keras itself. So if it depends on tf.data (for example) currently, then that should to be abstracted and adapted for the other backends. There may even already be something in Keras 3.

@mattdangerw
Copy link
Member

@jamesmyatt @Lundez yeah the issue moreso than tf-text is that all preprocessing is currently run through tf.data. So tf.data will feed your jax or torch program for a pure vision model. Which at a performance level is actually great, tf.data is quite efficient.

So the expected use today is for the library is a cpu install of tensorflow with a gpu torch or jax (see here and here), or a gpu install of tf. Cuda version aligning across frameworks is indeed tricky, Colab and Kaggle do it but I wouldn't recommend for a local development.

I agree tough this dep kinda sucks. Tensorflow is a huge binary. It would be great to relax this constraint, and thing things like our ObjectDetection and ImageClassifier tasks would be runnable without a tf install and tf would only be in the loop for text models. But we should start that work at the code level, just ditching the dep today will break people for little gain.

So for today...

  • The common workflow is tf, jax or torch with a cpu only tf install for preprocessing. API works as advertise on all three backends.
  • The power user workflow is a --no-deps install, use only backbone APIs and do all your own preprocessing, task construction, etc.

@mattdangerw
Copy link
Member

I opened #2128 as a first step for this.

@Lundez
Copy link

Lundez commented Mar 7, 2025

I opened #2128 as a first step for this.

Exciting.

While I understand the issue that Keras Hub relies on TF Datasets, and the perks of efficient pipelines makes sense. The same could be said about making Keras multi-framework in 3.0, TF isn't a slow framework (if you apply XLA).

I'm happy that the investigation starts now to drop the reliance as it'd be very welcome!
Perhaps if TF Dataset didn't rely on TF it could be a very good compromise, but I guess that's an even bigger project 😂

Once again, thanks for initiating a move in the right direction!

EDIT: I hope I find the time to look into #2128

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants