conversation2foods

Extract what foods were eaten from a transcript of a mealtime conversation.

Requirements

Python 3

The spacy python package, which can be installed with: pip install spacy

Spacy english data, which can be downloaded with: python -m spacy download en

pytorch is required, installation instructions are here Note that every neural net is set to use cuda, this could be adapted with little effort.

The transformers package, which can be installed with: pip install transformers

Pipeline

Run clean_hslld.py to clean all the transcripts (create the folder 'transcripts' for them to be saved to)

Then run parse_json_labels.py to generate evidence data (create the destination 'chunks') This uses the labels and the clean transcripts to create a set of (text_chunk, food) pairs

Run augment_data.py to augment the data by swapping in different foods to increase the number of true positives (create the destination folder 'chunks_augmented')

Then run generate_evidence_classifier_data.py to convert the evidence data into labeled bert data (create the folder 'bert_evidence'). A true label indicates that the chunk contains evidence of a food If using augmented data, this should also be run on the augmented chunks (create the folder 'bert_evidence_augmented'). The input and ouput path are declared after the imports so these can be set easily.

Run train_test_split.py to create training and desting csv files. Some parameters can be tweaked in this file, like which files end up in the test set, and whether or not the training set is balanced and augmented

At this point you can run train_classifier.py or nn_evidence.py to train a classifier or neural network

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
labels		labels
old approaches		old approaches
pickles		pickles
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
augment_data.py		augment_data.py
bertinator.py		bertinator.py
clean_hslld.py		clean_hslld.py
cleaning_foods.py		cleaning_foods.py
clf_evidence_finder.py		clf_evidence_finder.py
construct_prompts.py		construct_prompts.py
extract_speech.py		extract_speech.py
foods_human_readable.txt		foods_human_readable.txt
generate_chunk_food_data.py		generate_chunk_food_data.py
generate_evidence_classifier_data.py		generate_evidence_classifier_data.py
nn_evidence.py		nn_evidence.py
nn_evidence_finder.py		nn_evidence_finder.py
nn_evidence_report.py		nn_evidence_report.py
nn_food_vectors.py		nn_food_vectors.py
parse_json_labels.py		parse_json_labels.py
train_classifier.py		train_classifier.py
train_test_split.py		train_test_split.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

conversation2foods

Requirements

Pipeline

About

Releases

Packages

Contributors 2

Languages

License

chrisraff/conversation2foods

Folders and files

Latest commit

History

Repository files navigation

conversation2foods

Requirements

Pipeline

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages