Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Fix missing column and variable reference before assignment #2

Merged
merged 5 commits into from
Dec 6, 2021

Conversation

annajung
Copy link
Contributor

@annajung annajung commented Dec 1, 2021

  • Add 'Number' column during unannotated data creation
    • Fixes KeyError when running run.py due to missing 'Number' column in the annotated data
  • Fix referencing variables before assignment issue
  • Add input validation for model argument
  • Update README with the correct path

Anna Jung (VMware) added 4 commits December 1, 2021 13:10
@annajung
Copy link
Contributor Author

annajung commented Dec 2, 2021

@difince @enyinna1234 @pramodrj07 @tzstoyanov PTAL thanks!

@pramodrj07
Copy link
Contributor

Are the changes in README.md anyway related to the other commits?

if model_type == 'CNN':
model = BaseCNN()
elif model_type == 'LSTM':
if model_type == 'LSTM':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since i am still trying to understand the code base, Can i know the intention behind this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user passes a model type that is not 'CNN' or 'LSTM', it throws variable reference before assignment error. Therefore, I made CNN the default by removing elif to make sure the model is always initialized. I also added input validation to make sure that users can pass in only CNN and LSTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that setting a model default value should be in the argument definition:
parser.add_argument('model', default='CNN', help='Model type to use for training')
It is more straightforward and the code is clean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion, should have not used the word default. I don't want to set CNN to default. The user should always explicitly pass in what algorithm they want to use with the flags that are appropriate for that training.

With the input validation, else statement will always be CNN.

Copy link
Contributor

@tzstoyanov tzstoyanov Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a, ok - didn't get that logic. So, in that case may be it is better to extend the validation - to exit with an error if the model parameter is missing ? I see two cases:

  • set model default value in argument definition, or
  • if the model is mandatory parameter - exit with an error if it is missing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have the validation with logic to throw an exception - see line 56

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so in that case model must be either CNN or LSTM in run(). Thanks!

@annajung
Copy link
Contributor Author

annajung commented Dec 2, 2021

Are the changes in README.md anyway related to the other commits?

@pramodrj07 no, it's a separate commit to update just the docs with correct path to run the files

@pramodrj07
Copy link
Contributor

Okay!! LGTM.
Just as a rule of thumb. It would be good to demarcate(or have a separate PR), if a commit tends to deviate from a PR's primary intention.

@@ -51,6 +51,10 @@ def run(annotated_filename, dataset_filename, outcome, encoding_type, model_type
parser.add_argument('-pad', action='store_true', default=False, help='Pad total length of each pull')

args = parser.parse_args()

if args.model != 'CNN' and args.model != 'LSTM':
raise Exception("Model must be either CNN or LSTM")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add supported model in the argument's help string also, something like that:
parser.add_argument('model', help='Model type to use for training, supported CNN and LSTM')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thanks

if model_type == 'CNN':
model = BaseCNN()
elif model_type == 'LSTM':
if model_type == 'LSTM':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that setting a model default value should be in the argument definition:
parser.add_argument('model', default='CNN', help='Model type to use for training')
It is more straightforward and the code is clean.

Signed-off-by: Anna Jung (VMware) <[email protected]>
Copy link
Contributor

@difince difince left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking into consideration Tzvetomir's notes, it LGTM

@@ -181,9 +182,11 @@ def pullStringConversation(self, export_filename="", export=True):
comment_row["Body"])
string_conversations.append(conversation.encode("ascii", "ignore").decode())
pull_urls.append(row["URL"])
pull_numbers.append(row["Number"])

# Export converation field dataset
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: conversation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, looks like there's a lot of the same typos in multiple places. I will leave that out of this PR and make a separate PR for clean up

@difince difince merged commit cc3762d into vmware-archive:main Dec 6, 2021
tzstoyanov added a commit to tzstoyanov/ml-conversational-analytic-tool that referenced this pull request Dec 8, 2021
When the model is trained, in order to run an inference service to serve
it, the model should be exported. An optional parameter "-save=name" is
added, to export the model with the given name. By default, the model is
not exported. Models are exported in directory:
 models/<name>-<outcome>/<version>/
and are compressed in file:
 models/<name>-<outcome>/<name>-<outcome>-<version>.tar.gz
The model's version is hardcoded to "0001", managing different model
versions is TBD.
The exported models are tested with kserve, the layout of directories and
archive file is designed in a way kserve tensorflow predictor expects.

fixes vmware-archive#2

Signed-off-by: Tzvetomir Stoyanov (VMware) <[email protected]>
tzstoyanov added a commit to tzstoyanov/ml-conversational-analytic-tool that referenced this pull request Dec 10, 2021
When the model is trained, in order to run an inference service to serve
it, the model should be exported. An optional parameter "-save=name" is
added, to export the model with the given name. By default, the model is
not exported. Models are exported in directory:
 models/<name>-<outcome>/<version>/
and are compressed in file:
 models/<name>-<outcome>/<name>-<outcome>-<version>.tar.gz
The model's version is hardcoded to "0001", managing different model
versions is TBD.
The exported models are tested with kserve, the layout of directories and
archive file is designed in a way kserve tensorflow predictor expects.

fixes vmware-archive#2

Signed-off-by: Tzvetomir Stoyanov (VMware) <[email protected]>
tzstoyanov added a commit to tzstoyanov/ml-conversational-analytic-tool that referenced this pull request Dec 13, 2021
When the model is trained, in order to run an inference service to serve
it, the model should be exported. Two optional parameters are
introduced:
  "-save NAME"
  "-save_version VERSION"
By default, the model is not exported. If "-save NAME" is specified, the
model is saved using given NAME. If "-save_version VERSION" is
specified, together with "-save NAME", the model is saved using given
NAME and VERSION. The "-save_version" is ignored, if "-save" is missing.
By default, version "001" is used. Models are exported in directory:
 models/<NAME>-<outcome>/<VERSION>/
and are compressed in file:
 models/<NAME>-<outcome>/<NAME>-<outcome>-<VERSION>.tar.gz
The exported models are tested with kserve, the layout of directories and
archive file is designed in a way kserve tensorflow predictor expects.

fixes vmware-archive#2

Signed-off-by: Tzvetomir Stoyanov (VMware) <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants