Crisis-NLP-Progress: Tracking resources related to computational linguistical techniques for emergency response.
- TREC-COVID - scientific articles annotated by relevance (relevant, partially-relevant, and non-relevant) from the TREC COVID track using CORD-19 dataset.
- Covid-category - tweets for training CT-BERT, annotated with either "category_personal" (33.3%) or "category_news" (66.7%).
- Covid-vaccine-sentiment - covid-related tweets on vaccine sentiment annotated with three sentiments "1", "-1" or "0".
- Covid-event - tweets categorised in five covid-related events, "Tested Positive", "Tested Negative", "Can not test", "Death", "Cure and Prevention". For each event, tweets are annotated with different several slot questions.
- COVID-QA - a question answering dataset consisting of 2,019 question/answer pairs annotated by volunteer biomedical experts on scientific articles related to COVID-19. The articles are 147 scientific articles selected from the CORD-19 dataset.
- CrisisLexT26 - tweets categorised in 26 crises, annotated by 8 information types, 4 types of informativeness, and 8 types of information sources.
- CrisisLexT6 - tweets categorised in 6 crises, annotated by binary relatedness, either "off-topic" or "on-topic".
- Crisis-eyewitness - comprises around 14,000 tweets categorised in four types of disaster, i.e., hurricanes, earthquakes, floods, and forest fires. These tweets are annotated by a taxonomy of eyewitness types, such as "direct-eyewitness", "indirect-eyewitness", "vulnerable-direct witness", etc.
- TREC-IS - around 50k as of the 2020-a round of the IS track. These tweets are annotated by a taxonomy of 25 information types (multi-label annotations), and 4 levels of priority describing the criticality of the tweets (single-label annotations).
- to add un-labeled data. For adding a new relevant dataset to this repository, please refer to this guide
- TweetsRetrieval - very fast Java-based tweets retrieval via tweets ids.
- TweepyCalls - various functions for calling Twitter-API using Tweepy.
- Twarc - a command line tool (and Python library) for archiving Twitter JSON.
- 2020: COVID-Twitter-BERT: A Natural Language Processing Model to Analyse COVID-19 Content on Twitter -pre-training.
- 2020: CrisisBERT: a Robust Transformer for Crisis Classification and Contextual Crisis Embedding -crisis tweets recognition and classification.
- 2020: On Identifying Hashtags in Disaster Twitter Data - Multi-task learning, classification, LSTM.
- 2020: A deep multi-modal neural network for informative Twitter content classification during emergencies - multi-modal classification, VGG-16 and LSTM.
- 2020: Classification for Crisis-Related Tweets Leveraging Word Embeddings and Data Augmentation - data augmentation, word embeddings.
- 2019: Label Embedding using Hierarchical Structure of Labels for Twitter Classification - label embedding, hierarchical multi-label classification.
- 2018: Deep Neural Networks versus Naive Bayes Classifiers for Identifying Informative Tweets during Disasters - RNN, CNN, and traditional ML.
- 2018: Convolutional neural network for earthquake detection and location, CNN.
- 2017: Robust Classification of Crisis-Related Data on Social Networks Using Convolutional Neural Networks - CNN, classification.
- 2017: Fifteen years of social media in emergencies: A retrospective review and future directions for crisis Informatics
- 2016: An Overview of Sentiment Analysis in Social Media and Its Applications in Disaster Relief
- 2015: Processing Social Media Messages in Mass Emergency: A Survey
- AAAI: The AAAI Conference on Artificial Intelligence
- SIGIR: The International ACM SIGIR Conference on Research and Development in Information Retrieval
- ACL: The Annual Meeting of the Association for Computational Linguistics
- CIKM: The International Conference on Information and Knowledge Managemen
- WWW: The International World Wide Web Conference
- WSDM: The ACM International Conference on Web Search and Data Mining
- EMNLP: The Conference on Empirical Methods in Natural Language Processing
- IJCAI: International Joint Conferences on Artificial Intelligence Organization
- ECIR: The annual European Conference on Information Retrieval
- ISCRAM: The Information Systems for Crisis Response and Management
- COLING: International Conference on Computational Linguistics
- NLP-IR: International Conference on Natural Language Processing and Information Retrieval
- LREC: The Conference on Language Resources and Evaluation
- ICWSM: International AAAI Conference on Web and Social Media
- ASONAM: The IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
- ACM Computing Surveys
- IEEE Computational Intelligence Magazine
- Journal of Contingencies and Crisis Management
- CrisisNLP - contains a decent number of resources for research on crisis informatics topics.
- CrisisLex - a repository of crisis-related social media data and tools including collections of crisis data and a lexicon of crisis terms.
The items included here so far are by no means completed but used as a starting point for the community's effort to make this list better and comprehensive. Hence, this repository welcomes the contribution from the community. If you find any items missed here, just pull requests to add them or email me at [email protected].