Estonian National Corpus 2023 (prevert)

View resource name in all available languages

Eesti keele ühendkorpus 2023 (annoteerimata)

Estonian NC 2023

ID:

https://doi.org/10.15155/3-00-0000-0000-0000-08C04M

Estonian corpus of written texts. Consists of the Estonian Reference Corpus (90s–2008), Contemporary and old literature, Estonian Web (2013, 2017, 2019, 2021, 2023), Timestamped Estonian corpora (2014–2021, 2020–2023), Estonian Wikipedia (articles: 2023, talkpages: 2017) and Estonian academic writing (2020–2023). Cleaned, deduplicated. Text type annotation: topics, genres.

ENCODING: UTF-8

== Comparison to ENC 2021 corpus
Balanced Corpus 1990–2008 ................. kept without changes
Reference Corpus 1990–2008 ................ kept without changes
Literature Old 1864–1945 .................. updated according to the source
Literature Contemporary 2000–2023 ......... updated according to the source (licensed under CLARIN ACA)
Web 2013 .................................. kept without changes
Web 2017 .................................. kept without changes
Wikipedia Talk 2017 ....................... kept without changes
Academic Texts (formerly DOAJ) up to 2023 . updated with new data
Web 2019 .................................. kept without changes
Web 2021 .................................. kept without changes
Wikipedia 2023 ............................ replacing Wikipedia 2021
Feeds (JSI) 2014–2021 ..................... kept without changes
Feeds (LC) 2020–2023 ...................... updated with new data
Web 2023 .................................. new

View resource description in all available languages

kirjeldus

You don’t have the permission to edit this resource.