SynEst (English-to-Estonian) Synthetic Estonian Parallel Corpus 
SynEst-en-to-et
ID:
https://doi.org/10.15155/5R1E-6R35
Synthetic parallel corpus with original English texts, machine-translated into Estonian and filtered.
Original English text sources:
- NewsCrawl (https://data.statmt.org/news-crawl) up to year 2021
- ParaCrawl v9 (https://paracrawl.eu): the English side of parallel corpora between English and German, Spanish, Finnish, French, Lithuanian, Latvian, Russian, Swedish, Ukrainian and Chinese
- United Nations Parallel Corpus (https://conferences.unite.un.org/uncorpus)
- OpenSubtitles (https://opus.nlpl.eu) monolingual English texts
Additional unfiltered data (not included in count):
- Reddit data (downloaded via https://github.com/microsoft/DialoGPT) in English
People who looked at this resource also viewed the following: