The problem statement, the different approaches to the problem, and even more discussion can be read in Chapter 5 of the thesis document.
-
Annotations directory
Contains the manually annotated data for the experiment. The annotations were used to present the results in documentation. More about the annotations can be read in this document.
-
Extra Data (Not Annotated) directory
Contains some results that are not annotated but are useful for evaluation of the algorithm. Unless needed to extend the algorithm, not needed by an end-user. More details about the files in this category can be read in the documentation for Annotations folder here and here.
-
makefile
File that can be used to run the module on the local system. Check here for details.
-
README.md
The current file
-
requirements.txt
File containing the necessary packages needed to run this project. Install them by cloning the repository to your system, and then type in terminal:
pip install -r requirements.txt
You might want to use
pip3
instead ofpip
, depending on your system. -
scripts directory
Contains the main script that runs the module. Needs to be copied into the correct udapy folder location (see makefile for details), before it can be used.
-
RunTimes directory
Contains the files containing the time taken to run the block for Afrikaans and Arabic data. The data is organised into 2 lines per run, for a total of 100 runs. Format:
Time when block starts processing Time when block stops processing and next block takes over
To start with the module, clone this repository in your system, and then run the commands in the given order:
make getdata
Downloads the required dependencies using requirements.txt
file, UDv2.4 data using the link
here and then prepares working
copies of the treebanks in the current directory.
make stats
Report all the instances of mis-directed dependencies of CCONJ UPOS and cc deprel
in *.stats
file, across all treebanks in UDv2.4. Requires UDv2.4 data in HOME
folder, but downloads and unzips the package
if not found.
make correction
Runs the script by first copying it into correct location. Generates the corrected Corrects the instances detected in *.direction
file, creates a *2.conllu
file with all the corrections, and
*2.direction
file with the instances not handled by the algorithm.
make clean
Removes all .conllu
and the files generated by this makefile.
RunTime based on 100 runs, as run on Ubuntu 18.04 (64-bit) on a 4-core Intel i5-6300 HQ processor.
Language | RunTime (in ms) |
---|---|
af | 81.33 ± 7.094 |
ar | 317.05 ± 23.996 |
For a through analysis of the results of the experiment, it is advised to read the documentation, for it contains thorough investigation of the results, with examples.
-
Chiara Alzetta, Felice Dell’Orletta, Simonetta Montemagni, and Giulia Venturi. Dangerous relations in dependency treebanks. In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories, pages 201–210, Prague, Czech Republic, 2017. URL https://www.aclweb.org/anthology/W17-7624.
-
Martin Popel, Zdeněk Žabokrtský, and Martin Vojtek. Udapi: Universal API for Universal Dependencies. In Proceedings of the NoDaLiDa 2017 Workshop onUniversal Dependencies (UDW 2017), pages 96–101, Gothenburg, Sweden, May 2017. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/W17-0412.
-
Leon Stassen. And-languages and WITH-languages. Linguistic Typology, 4(1):1–54, 2000. doi: https://doi.org/10.1515/lity.2000.4.1.1 URL https://www.degruyter.com/view/journals/lity/4/1/article-p1.xml
-
Viacheslav Chirikba. Evidential category and evidential strategy in Abkhaz. Typological Studies in Language, 54:243–272, 2003. URL https://benjamins.com/catalog/tsl.54.14chi.
-
Winfried Boeder. The South Caucasian languages. Lingua, 115(1): 5–89, 2005. ISSN 0024-3841. doi: https://doi.org/10.1016/j.lingua.2003.06.002. URL http://www.sciencedirect.com/science/article/pii/S0024384103001244.
-
Nivre, Joakim; Abrams, Mitchell; Agić, Željko; et al., 2019, Universal Dependencies 2.4, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University, http://hdl.handle.net/11234/1-2988.