Write a light-weight benchmarking script to quickly evaluate our models #634

chenmoneygithub · 2023-01-04T19:18:27Z

The code should go into keras_nlp/benchmarks.

We can use IMDB sentiment analysis task, guidance for which can be found here.

One challenging point is we want this script to be able to evaluate all our Classifier models without writing custom code. Since for all models Classifier we have Preprocessor, and they have the unified name format {model_name}Classifier/{model_name}Preprocessor, e.g., BertClassifier/BertPreprocessor, we should be able to make the code reusable by having a flag model_name.

Here is the requirement in more details:

example file name: keras_nlp/benchmarks/sentiment_analysis.py
example running command:
```
python keras_nlp/benchmarks/sentiment_analysis.py \
    --model="bert" \
    --preset="bert_small_en_uncased" \
    --learning_rate=5e-5 \
    --num_epochs=5 \
    --batch_size=32
```
flag --model specifies the model name, and --preset specifies the preset under testing. --preset could be None, while --model is required. Other flags are common training flags.
output: print out a few metrics, including
- validation accuracy/F1 for each epoch.
- testing accuracy/F1 after training is done.
- total elapsed time (in seconds).

The text was updated successfully, but these errors were encountered:

jbischof · 2023-01-05T15:34:33Z

@chenmoneygithub this isn't enough information to solicit contributions. What do you want to benchmark? What is the desired output? Should this be a direct GCP integration or just a python script?

chenmoneygithub · 2023-01-05T17:43:55Z

@jbischof we can have strict requirements on outputs/metrics/logging later, I am making this flexible that any runnable IMDb review sentiment analysis script is welcome, and contributors could specify their own metrics.

I don't feel contributors will bother doing cloud integration, they don't have our GCP project access.

jbischof · 2023-01-05T19:47:18Z

@chenmoneygithub that's one reason I'm not sure this is appropriate for contributors. Either way, we need a lot more details!

chenmoneygithub · 2023-01-05T20:03:31Z

Sure! In a way I don't want the description to be an article (personally I am discouraged from reading those), so if some contributor expresses interest, I will provide more details to them directly.

mattdangerw · 2023-01-05T21:26:07Z

My take is it useful to show the usage we want when we can.

For a new API we can basically write the key docstring examples in issue description.
For a tool like this, we could show the command line invocations we would like to support, and a little detail on what the output should be.

That will be useful information to give potential contributors, and make sure we get something back that is in line with our expectations.

snoringpig · 2023-01-10T20:43:51Z

Hi! I'm interested to try out - still reading the details. To clarify, are all models under this directory (https://github.com/keras-team/keras-nlp/tree/master/keras_nlp/models) "classifier models"? Thanks!

mattdangerw · 2023-01-11T01:38:05Z

@snoringpig only the classes with classifier in the name are classifiers, e.g. BertClassifier, and RobertaClassifier.

The other main modeling classes we have are backbones, like BertBackbone. These are not specialized to a task, so would be more work (with little gain) to add to our benchmarking suite right now.

@chenmoneygithub the issue description looks good. I might add a few outputs.

Let's print out the train_step time.
Let's print out the test hardware via tf.config.list_physical_devices.

Then the output of these benchmarks can be a nice little report we can copy paste elsewhere. Wdyt?

jbischof · 2023-01-11T03:58:10Z

We might need a better way to identify the models than "bert"....how about we give the class name like BertClassifier instead so we don't need to maintain a lookup table?

mattdangerw · 2023-10-18T20:44:54Z

We definitely need more benchmarking with Keras 3 on the way, but will close this and reopen one with a better description for the multi-backend world.

chenmoneygithub added the stat:contributions welcome Add this label to feature request issues so they are separated out from bug reporting issues label Jan 4, 2023

NusretOzates mentioned this issue Jan 15, 2023

Light-weight benchmarking script #664

Merged

mattdangerw closed this as completed Oct 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write a light-weight benchmarking script to quickly evaluate our models #634

Write a light-weight benchmarking script to quickly evaluate our models #634

chenmoneygithub commented Jan 4, 2023 •

edited

Loading

jbischof commented Jan 5, 2023

chenmoneygithub commented Jan 5, 2023

jbischof commented Jan 5, 2023

chenmoneygithub commented Jan 5, 2023

mattdangerw commented Jan 5, 2023

snoringpig commented Jan 10, 2023

mattdangerw commented Jan 11, 2023 •

edited

Loading

jbischof commented Jan 11, 2023

mattdangerw commented Oct 18, 2023

Write a light-weight benchmarking script to quickly evaluate our models #634

Write a light-weight benchmarking script to quickly evaluate our models #634

Comments

chenmoneygithub commented Jan 4, 2023 • edited Loading

jbischof commented Jan 5, 2023

chenmoneygithub commented Jan 5, 2023

jbischof commented Jan 5, 2023

chenmoneygithub commented Jan 5, 2023

mattdangerw commented Jan 5, 2023

snoringpig commented Jan 10, 2023

mattdangerw commented Jan 11, 2023 • edited Loading

jbischof commented Jan 11, 2023

mattdangerw commented Oct 18, 2023

chenmoneygithub commented Jan 4, 2023 •

edited

Loading

mattdangerw commented Jan 11, 2023 •

edited

Loading