This study offers a comprehensive perspective of syntactic complexity in English and Russian texts written by L1 and L2 speakers. We analyze 20 syntactic complexity measures pertaining to the sentential, clausal, and phrasal levels, and explore their interrelationships, correlation with proficiency, task type, and genre. We propose a new measure of syntactiс complexity based on Levenshtein distance at the clausal level. Our findings reveal strong correlations among length-based measures and highlight the problematic nature of the Coordination Index commonly used in the literature. We also find support for the idea that complexity generally increases with proficiency, with some measures plateauing at advanced levels. Syntactic complexity measures can also reliably distinguish between texts of different genres and task types; some values are language-specific, differing in the two languages considered. Despite the challenging nature of our data, some complexity features, namely length-based indices and phrasal complexity measures, are useful in the task of automatic proficiency prediction. As a practical application of our research, we introduce syntaxcomp, a Python library for extracting syntactic complexity measures from CoNLL-U annotations.
Klykova, E. A. (2024). Syntactic Complexity of Written Texts in Russian and English as Foreign Languages [Master's Thesis, Higher School of Economics]. https://www.hse.ru/en/edu/vkr/931188956