Releases: dimsum16/dimsum-data
Releases · dimsum16/dimsum-data
Training/test data + scripts 1.5
Training/test data + scripts 1.4
- Data: Revised annotations in Twitter portions of training data (see README.md for a description)
- Scripts: Evaluation script now prints a less cryptic error message with malformed input
Training/test data + scripts 1.3
- Data: Adds blind test set (see README.md for a description)
- Scripts: Fixes a couple of bugs in the evaluation script, and updates sst2tags.py to support non-ASCII characters in tokens
Training data + scripts 1.2
- Fixes several inconsistencies in the training data, especially in the treatment of auxiliaries and URLs, and the parent index for non-
I
/i
tokens (now uniformly an explicit0
). - Added scripts for evaluation and conversion to/from a one-sentence-per-line format. The 9-column CoNLLesque format remains the official one for the task.
Training data 1.1
Lemmas in the Twitter part of the training data were not true lemmas but only lowercased versions of the observed tokens. This release brings consistent lemmatization for the whole training set.
Training data v1.0
Training data for the DiMSUM shared task at SemEval 2016. The dataset combines and harmonizes existing corpora annotated for multiword expressions and noun and verb supersenses.