EKI error-annotated Estonian L2 learner corpus

View resource name in all available languages

EKI eesti keele L2 õppijate veamärgendatud tekstide korpus

The materials for the error annotated corpus are based on the Estonian learner corpus EMMA, containing an Estonian learner assessment test (7th grade, 254 texts), basic school final exam (9th grade, 251 texts) and state exam data (12th grade, 998 texts) from the Education and Youth Board. The value of the corpus is enhanced by a manually created error annotation layer, which allows for a more in-depth study and analysis of the language use of learners of Estonian as a second language. The ERRANT-M2 error categories have been used as the basis for marking errors. The corpus contains 1503 texts. The goal is to continuously expand the corpus with new incoming materials.

Annotation layers: The corpus includes a manually added error annotation layer by annotators. In addition, the corpus is automatically annotated morphologically (lemma, part of speech, grammatical categories for each word), surface-syntactically (syntactic functions), and dependency-syntactically. In the dependency-syntactic approach, there is a dependency relationship between two words – one word is subordinate and the other is the head, and the relationship is named according to the syntactic function.

You don’t have the permission to edit this resource.