Estonian National Corpus 2019 (.vrt format)
View resource name in all available languages
Eesti keele ühendkorpus 2019 (.vrt vormingus)
Cite as: Kallas, J., & Koppel, K. (2020). <i>Eesti keele ühendkorpus 2019 (.vrt vormingus)</i>. Center of Estonian Language Resources. https://doi.org/10.15155/3-00-0000-0000-0000-08489L
Corpus is based on Estonian National Corpus 2013, which was renewed by Lexical Computing Ltd. in 2017 and 2019 at the request of Estonian Language Institute.
Subcorpora are: Estonian Reference Corpus 1990-2008, Estonian Web 2013, Estonian Web 2017, Estonian Web 2019, Estonian Wikipedia 2017, Estonian Wikipedia 2019, Estonian Open Access Journals (DOAJ), blogs, discussion, education, fiction, food, health, journals, news, religion, science, sex, society, sports.
Web corpora contain downloaded content of Estonian websites.
File format is .vrt, often used by Korp, SketchEngine and other corpus query systems using CQP.
Tools described at http://corpus.tools were used to create the corpus: SpederLing, JustText, Chared, Onion and wiki2corpus. Corpus is lemmatized, tagged and unified with EstNLTK 1.6 analyzer.
View resource description in all available languages