SynEst (English-to-Estonian) Synthetic Estonian Parallel Corpus

SynEst-en-to-et

ID:

https://doi.org/10.15155/5R1E-6R35

Synthetic parallel corpus with original English texts, machine-translated into Estonian and filtered.

Original English text sources:
- NewsCrawl (https://data.statmt.org/news-crawl) up to year 2021
- ParaCrawl v9 (https://paracrawl.eu): the English side of parallel corpora between English and German, Spanish, Finnish, French, Lithuanian, Latvian, Russian, Swedish, Ukrainian and Chinese
- United Nations Parallel Corpus (https://conferences.unite.un.org/uncorpus)
- OpenSubtitles (https://opus.nlpl.eu) monolingual English texts

Additional unfiltered data (not included in count):
- Reddit data (downloaded via https://github.com/microsoft/DialoGPT) in English

You don’t have the permission to edit this resource.