Topic: exploratory data analysis about ventilators in Maude
Method: data wrangling in text, NLP
Tools: scispacy, NLP
Data source: https://open.fda.gov/apis/device/event/
Background: In the surge of covid and shortage of ventilators, many companies try to make their own ventilator machines. The purpose of this project is to investigate adverse events caused by ventilators using FDA medical device adverse event database. More specific, this project only focused on Medtronic Puritan Bennett™ 980 and 840 Ventilator and analyzed what are the main problems with these two series.
Objectives:
- What are the main problems with these 2 models?
- Model 980 is an update version of 840 with many new improvement features. By applying LDA, I want to check if the well-known problems in 840 model still exist in the new model 980
- Apply unigram, bigram and trigram LDA and compare their performance on these texts
- collect data using MAUDE API
- using Scispacy, cleaning, normalization, tokenization, stopword removal, POS tagging, stemming and lemmatization
- 1. Frequency-based: count vector, tf-idf, co-occurence
- 2. Prediction-based on probability: word2vec (CBOW, skipgram)
- Topic modeling using LDA, measure performance by coherence score