Estonian National Corpus 2023 (prevert) 
View resource name in all available languages
Eesti keele ühendkorpus 2023 (annoteerimata)
Estonian NC 2023
ID:
https://doi.org/10.15155/3-00-0000-0000-0000-08C04M
Estonian corpus of written texts. Consists of the Estonian Reference Corpus (90s–2008), Contemporary and old literature, Estonian Web (2013, 2017, 2019, 2021, 2023), Timestamped Estonian corpora (2014–2021, 2020–2023), Estonian Wikipedia (articles: 2023, talkpages: 2017) and Estonian academic writing (2020–2023). Cleaned, deduplicated. Text type annotation: topics, genres.
ENCODING: UTF-8
== Comparison to ENC 2021 corpus
Balanced Corpus 1990–2008 ................. kept without changes
Reference Corpus 1990–2008 ................ kept without changes
Literature Old 1864–1945 .................. updated according to the source
Literature Contemporary 2000–2023 ......... updated according to the source (licensed under CLARIN ACA)
Web 2013 .................................. kept without changes
Web 2017 .................................. kept without changes
Wikipedia Talk 2017 ....................... kept without changes
Academic Texts (formerly DOAJ) up to 2023 . updated with new data
Web 2019 .................................. kept without changes
Web 2021 .................................. kept without changes
Wikipedia 2023 ............................ replacing Wikipedia 2021
Feeds (JSI) 2014–2021 ..................... kept without changes
Feeds (LC) 2020–2023 ...................... updated with new data
Web 2023 .................................. new
View resource description in all available languages
kirjeldus
People who looked at this resource also viewed the following: