You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Natural Language Processing - semantic analysis on tweet dataset regarding Coronavirus
In the folder of the src/nlp_task, you will find models for semantic analysis on a tweet dataset regarding the Coronavirus.
The semantic analysis is performed using a pretrained version of the BERT model and an untrained version of it, as well
as a baseline LSTM model.
LSTM model
The baseline model can be found in the base_lstm_model.py file. The program can easily be run by running python base_lstm_model.py. It will print some data statistics, a model summary, a classification report and it will generate a confusion matrix in the directory that this file is run from.
BERT model
The BERT model can be found in the BERT_model.py file. The model is designed using the Google colab notebook
found here. This model is pretrained using
the bert-base-uncased model, and finetuned according the semantic analysis task with 5 different semantic labels.
Before training and testing, the tweets are cleaned up using a cleanup function as defined by Edgar Jonathan for another
BERT model on the same dataset, found here.
The BERT model in the file can be run in both the pretrained and the untrained manner by commenting or uncommenting the
following lines:
if__name__=="__main__":
# uncomment these lines to run an untrained BERT modelprint("\n\n----------------- untrained ------------------")
main(True)
# uncomment these lines to run an pretrained BERT model using bert-base-uncasedprint("\n\n----------------- pretrained -----------------")
main(False)