-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write a light-weight benchmarking script to quickly evaluate our models #634
Comments
@chenmoneygithub this isn't enough information to solicit contributions. What do you want to benchmark? What is the desired output? Should this be a direct GCP integration or just a python script? |
@jbischof we can have strict requirements on outputs/metrics/logging later, I am making this flexible that any runnable IMDb review sentiment analysis script is welcome, and contributors could specify their own metrics. I don't feel contributors will bother doing cloud integration, they don't have our GCP project access. |
@chenmoneygithub that's one reason I'm not sure this is appropriate for contributors. Either way, we need a lot more details! |
Sure! In a way I don't want the description to be an article (personally I am discouraged from reading those), so if some contributor expresses interest, I will provide more details to them directly. |
My take is it useful to show the usage we want when we can.
That will be useful information to give potential contributors, and make sure we get something back that is in line with our expectations. |
Hi! I'm interested to try out - still reading the details. To clarify, are all models under this directory (https://github.com/keras-team/keras-nlp/tree/master/keras_nlp/models) "classifier models"? Thanks! |
@snoringpig only the classes with classifier in the name are classifiers, e.g. The other main modeling classes we have are backbones, like @chenmoneygithub the issue description looks good. I might add a few outputs.
Then the output of these benchmarks can be a nice little report we can copy paste elsewhere. Wdyt? |
We might need a better way to identify the models than |
We definitely need more benchmarking with Keras 3 on the way, but will close this and reopen one with a better description for the multi-backend world. |
The code should go into
keras_nlp/benchmarks
.We can use IMDB sentiment analysis task, guidance for which can be found here.
One challenging point is we want this script to be able to evaluate all our
Classifier
models without writing custom code. Since for all modelsClassifier
we havePreprocessor
, and they have the unified name format{model_name}Classifier
/{model_name}Preprocessor
, e.g.,BertClassifier
/BertPreprocessor
, we should be able to make the code reusable by having a flagmodel_name
.Here is the requirement in more details:
keras_nlp/benchmarks/sentiment_analysis.py
--model
specifies the model name, and--preset
specifies the preset under testing.--preset
could be None, while--model
is required. Other flags are common training flags.The text was updated successfully, but these errors were encountered: