Releases · dimsum16/dimsum-data

28 Dec 21:10

nschneid

1.5

cd92971

Training/test data + scripts 1.5 Latest

Latest

Updates the counts in TAGSET.md to match the training data in the 1.4 release.

Assets 2

28 Dec 20:52

nschneid

1.4

b86a56a

Training/test data + scripts 1.4

Data: Revised annotations in Twitter portions of training data (see README.md for a description)
Scripts: Evaluation script now prints a less cryptic error message with malformed input

Assets 2

17 Dec 00:40

nschneid

1.3

29aa4a9

Training/test data + scripts 1.3

Data: Adds blind test set (see README.md for a description)
Scripts: Fixes a couple of bugs in the evaluation script, and updates sst2tags.py to support non-ASCII characters in tokens

Assets 2

09 Nov 21:36

nschneid

1.2

837ffb5

Training data + scripts 1.2

Fixes several inconsistencies in the training data, especially in the treatment of auxiliaries and URLs, and the parent index for non-I/i tokens (now uniformly an explicit 0).
Added scripts for evaluation and conversion to/from a one-sentence-per-line format. The 9-column CoNLLesque format remains the official one for the task.

Assets 2

08 Oct 11:40

andersjo

1.1

16d0629

Training data 1.1

Lemmas in the Twitter part of the training data were not true lemmas but only lowercased versions of the observed tokens. This release brings consistent lemmatization for the whole training set.

Assets 2

25 Sep 13:33

nschneid

1.0

9ca3c63

Training data v1.0

Training data for the DiMSUM shared task at SemEval 2016. The dataset combines and harmonizes existing corpora annotated for multiword expressions and noun and verb supersenses.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: dimsum16/dimsum-data

Training/test data + scripts 1.5

Training/test data + scripts 1.4

Training/test data + scripts 1.3

Training data + scripts 1.2

Training data 1.1

Training data v1.0