EstQA Question Answering dataset



Please use DOI in citation:

Dataset for extractive question answering in Estonian. It based on Wikipedia articles, pre-filtered via PageRank.
Training set includes 776 context-question-answer triplets. There are several possible answers per question, each in a separate triplet. Number of different questions is 512.
Test set includes 603 samples. Each sample contains one or more golden answers. Altogether there are 892 golden answers.

If you use this dataset for research, please cite the following paper:

author = {Anu Käver},
title = {Extractive Question Answering for Estonian Language},
school = {Tallinn University of Technology (TalTech)},
year = 2021

