TimeML annotated corpus of Estonian newspaper articles



Estonian TimeML Annotated Corpus (ver 2.0)

The corpus consists of 80 Estonian newspaper articles (approx. 22,000 word tokens) with manually corrected morphological and dependency syntactic annotations, and with manually added temporal semantic annotations. This corpus is a subcorpus of Estonian Dependency Treebank ( https://github.com/EstSyntax/EDT ).

Temporal semantic annotations are based on an adaption of the TimeML specification ( http://www.timeml.org/ ), and consist of EVENT, TIMEX and TLINK annotations. The creation process of the corpus, along with the evaluation of consistency of annotation is described by Orasmaa (2014a, 2014b).

Format of the corpus

See https://github.com/soras/EstTimeMLCorpus/blob/master/readme.txt for details.

Related publications

The creation of this corpus and its first version is described in publications:

S.Orasmaa (2014a). Towards an Integration of Syntactic and Temporal Annotations in Estonian. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14).

S.Orasmaa (2014b). How Availability of Explicit Temporal Cues Affects Manual Temporal Relation Annotation. Human Language Technologies - The Baltic Perspective (215 - 218). IOS Press.

