Fix missing column and variable reference before assignment #2

annajung · 2021-12-01T20:46:17Z

Add 'Number' column during unannotated data creation
- Fixes KeyError when running run.py due to missing 'Number' column in the annotated data
Fix referencing variables before assignment issue
Add input validation for model argument
Update README with the correct path

Signed-off-by: Anna Jung (VMware) <[email protected]>

annajung · 2021-12-02T16:12:33Z

@difince @enyinna1234 @pramodrj07 @tzstoyanov PTAL thanks!

pramodrj07 · 2021-12-02T19:51:58Z

Are the changes in README.md anyway related to the other commits?

pramodrj07 · 2021-12-02T19:53:05Z

ml-conversational-analytic-tool/run.py

-    if model_type == 'CNN':
-        model = BaseCNN()
-    elif model_type == 'LSTM':
+    if model_type == 'LSTM':


Since i am still trying to understand the code base, Can i know the intention behind this change?

When a user passes a model type that is not 'CNN' or 'LSTM', it throws variable reference before assignment error. Therefore, I made CNN the default by removing elif to make sure the model is always initialized. I also added input validation to make sure that users can pass in only CNN and LSTM.

I think that setting a model default value should be in the argument definition:
parser.add_argument('model', default='CNN', help='Model type to use for training')
It is more straightforward and the code is clean.

Sorry for the confusion, should have not used the word default. I don't want to set CNN to default. The user should always explicitly pass in what algorithm they want to use with the flags that are appropriate for that training.

With the input validation, else statement will always be CNN.

a, ok - didn't get that logic. So, in that case may be it is better to extend the validation - to exit with an error if the model parameter is missing ? I see two cases:

set model default value in argument definition, or

if the model is mandatory parameter - exit with an error if it is missing

I do have the validation with logic to throw an exception - see line 56

I see, so in that case model must be either CNN or LSTM in run(). Thanks!

annajung · 2021-12-02T20:12:46Z

Are the changes in README.md anyway related to the other commits?

@pramodrj07 no, it's a separate commit to update just the docs with correct path to run the files

pramodrj07 · 2021-12-02T21:31:36Z

Okay!! LGTM.
Just as a rule of thumb. It would be good to demarcate(or have a separate PR), if a commit tends to deviate from a PR's primary intention.

tzstoyanov · 2021-12-03T03:36:42Z

ml-conversational-analytic-tool/run.py

@@ -51,6 +51,10 @@ def run(annotated_filename, dataset_filename, outcome, encoding_type, model_type
    parser.add_argument('-pad', action='store_true', default=False, help='Pad total length of each pull')

    args = parser.parse_args()
+
+    if args.model != 'CNN' and args.model != 'LSTM':
+        raise Exception("Model must be either CNN or LSTM")


I would suggest to add supported model in the argument's help string also, something like that:
parser.add_argument('model', help='Model type to use for training, supported CNN and LSTM')

Done! Thanks

tzstoyanov · 2021-12-03T03:55:39Z

ml-conversational-analytic-tool/run.py

-    if model_type == 'CNN':
-        model = BaseCNN()
-    elif model_type == 'LSTM':
+    if model_type == 'LSTM':


I think that setting a model default value should be in the argument definition:
parser.add_argument('model', default='CNN', help='Model type to use for training')
It is more straightforward and the code is clean.

Signed-off-by: Anna Jung (VMware) <[email protected]>

difince

Taking into consideration Tzvetomir's notes, it LGTM

enyinna1234 · 2021-12-03T20:22:36Z

ml-conversational-analytic-tool/featureVector.py

@@ -181,9 +182,11 @@ def pullStringConversation(self, export_filename="", export=True):
                                                                          comment_row["Body"])
            string_conversations.append(conversation.encode("ascii", "ignore").decode())
            pull_urls.append(row["URL"])
+            pull_numbers.append(row["Number"])

        # Export converation field dataset


Typo: conversation

good catch, looks like there's a lot of the same typos in multiple places. I will leave that out of this PR and make a separate PR for clean up

When the model is trained, in order to run an inference service to serve it, the model should be exported. An optional parameter "-save=name" is added, to export the model with the given name. By default, the model is not exported. Models are exported in directory: models/<name>-<outcome>/<version>/ and are compressed in file: models/<name>-<outcome>/<name>-<outcome>-<version>.tar.gz The model's version is hardcoded to "0001", managing different model versions is TBD. The exported models are tested with kserve, the layout of directories and archive file is designed in a way kserve tensorflow predictor expects. fixes vmware-archive#2 Signed-off-by: Tzvetomir Stoyanov (VMware) <[email protected]>

When the model is trained, in order to run an inference service to serve it, the model should be exported. Two optional parameters are introduced: "-save NAME" "-save_version VERSION" By default, the model is not exported. If "-save NAME" is specified, the model is saved using given NAME. If "-save_version VERSION" is specified, together with "-save NAME", the model is saved using given NAME and VERSION. The "-save_version" is ignored, if "-save" is missing. By default, version "001" is used. Models are exported in directory: models/<NAME>-<outcome>/<VERSION>/ and are compressed in file: models/<NAME>-<outcome>/<NAME>-<outcome>-<VERSION>.tar.gz The exported models are tested with kserve, the layout of directories and archive file is designed in a way kserve tensorflow predictor expects. fixes vmware-archive#2 Signed-off-by: Tzvetomir Stoyanov (VMware) <[email protected]>

Anna Jung (VMware) added 4 commits December 1, 2021 13:10

Fix referncing variables before assignment in run.py

cf0430b

Signed-off-by: Anna Jung (VMware) <[email protected]>

Update README to include correct path to the scripts

ec7b4f9

Signed-off-by: Anna Jung (VMware) <[email protected]>

Fix missing Number column in the annotated dataset

7c401c7

Signed-off-by: Anna Jung (VMware) <[email protected]>

Add model input validation for run.py

2189fe1

Signed-off-by: Anna Jung (VMware) <[email protected]>

vmwclabot added the cla-not-required label Dec 1, 2021

annajung requested review from difince, enyinna1234, pramodrj07 and tzstoyanov December 1, 2021 20:48

pramodrj07 reviewed Dec 2, 2021

View reviewed changes

tzstoyanov suggested changes Dec 3, 2021

View reviewed changes

Modify help message for model argument

74755c8

Signed-off-by: Anna Jung (VMware) <[email protected]>

difince approved these changes Dec 3, 2021

View reviewed changes

tzstoyanov approved these changes Dec 3, 2021

View reviewed changes

enyinna1234 approved these changes Dec 3, 2021

View reviewed changes

enyinna1234 reviewed Dec 3, 2021

View reviewed changes

difince merged commit cc3762d into vmware-archive:main Dec 6, 2021

tzstoyanov mentioned this pull request Dec 8, 2021

Export the trained model #10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing column and variable reference before assignment #2

Fix missing column and variable reference before assignment #2

annajung commented Dec 1, 2021

annajung commented Dec 2, 2021

pramodrj07 commented Dec 2, 2021

pramodrj07 Dec 2, 2021

annajung Dec 2, 2021

tzstoyanov Dec 3, 2021

annajung Dec 3, 2021

tzstoyanov Dec 3, 2021 •

edited

Loading

annajung Dec 3, 2021

tzstoyanov Dec 3, 2021

annajung commented Dec 2, 2021

pramodrj07 commented Dec 2, 2021

tzstoyanov Dec 3, 2021

annajung Dec 3, 2021

tzstoyanov Dec 3, 2021

difince left a comment

enyinna1234 Dec 3, 2021

annajung Dec 3, 2021

Fix missing column and variable reference before assignment #2

Fix missing column and variable reference before assignment #2

Conversation

annajung commented Dec 1, 2021

annajung commented Dec 2, 2021

pramodrj07 commented Dec 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tzstoyanov Dec 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annajung commented Dec 2, 2021

pramodrj07 commented Dec 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

difince left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tzstoyanov Dec 3, 2021 •

edited

Loading