Phonetic Corpus of Estonian Spontaneous Speech v.1.0.6

57 Last view: 2026-04-15

Phonetic Corpus of Estonian Spontaneous Speech v.1.0.6

View resource name in all available languages

Eesti keele spontaanse kõne foneetiline korpus v.1.0.6

The corpus consists of high quality audio recordings of spontaneous Estonian the segmentation on different levels. The main body of the corpus contains dialogues, but includes also a sub-corpus of lecture monologues and a sub-corpus of spontaneous discussions with three participants. The speakers are from different age groups and various dialectological background.

Most of the recordings are made in a recording studio, some also on fieldwork. The audio signal of each speaker is recorded in a separate channel. The distance between the speakers is about 1.5-2 meters to minimize the effect of overlaps. Recordings are saved in PCM wav-format. Annotation is saved in Praat TextGrid format in utf-8 text files.

The current version of the corpus is approximately 127 hours of recordings from 195 speakers. Manual word and phoneme level annotation is available for 100 hours of recordings (770 000 words). For 18 h of dialogues and 15 h of trialogues also video recordings (mp4) is available. The subset of trialogues includes breathing signal recorded with belt pletysmograph.

Segmentation and annotation is done with the Praat program (www.praat.org). Recordings are segmented manually on different levels. Following tiers are used:
-Words (in orthographic spelling),
-Phonemes (SAMPA adjusted for Estonian),
-Syllables (short – long, open – closed),
-Prosodic feet (stress pattern, quantity),
-Intonation phrases or inter-pausal units;
-Voice quality (creaky voice);
-Morphological information (automatically annotated using Estmorf/Vabamorf)

You don’t have the permission to edit this resource.

DistributionDOI

10.15155/1-00-0000-0000-0000-001AFL

Availability

Available - Unrestricted Use

Licence

CLARIN RES

Distribution Access/Medium: Accessible Through Interface, Downloadable

Contact Person

Pärtel Lippus

text
audio

Monolingual text corpusLanguages

Estonian

Linguality

Linguality type: Monolingual

Size

770 000 Words

Monolingual audio corpusLanguages

Estonian

Linguality

Linguality type: Monolingual

Size

100 Hours

Modalities

Spoken Language

AnnotationAlignment

StandOff: True

Segmentation level: Phoneme, Phrase, Syllable, Word

Format: TextGrid

Annotation Mode: Manual

Content

Speech items: Free Speech

Noise Level: Low

Setting

Naturality: Spontaneous

Conversational type: Dialogue

Audio Formatsaudio/x-wav

Recording quality: Very High

Quantization: 16

Number of tracks: 1

Sampling rate: 44100

Signal encoding: LinearPCM

Recording

Recording environment: Anechoic Chamber, Conference Room, Other

Recording device type: Hard Disk

Resource Creation

Resource Creator

Funding Project

Eesti keele spontaanse kõne foneetilise korpuse arendused 3, Development of the Phonetic Corpus of Spontaneous Estonian Speech III (EKTB3 - EKTB3)

URL: https://www.etis.ee/...

Funding Type: National Funds

Funder: Haridus- ja teadusministeerium

Funding Country: EE

Project duration: 01/01/2018 - 12/31/2022

Metadata

Created: 09/08/2020

Last Updated: 11/23/2020

Metadata Creator

Pärtel Lippus

Version

Version: 1.0.6

Last Updated: 09/08/2020

Relation

Related Resource: http://dx.doi.org/10...

Relation Type: newVersionOf

Related Resource: http://dx.doi.org/10...

Relation Type: newVersionOf

Related Resource: http://dx.doi.org/10...

Relation Type: newVersionOf

Related Resource: http://dx.doi.org/10...

Relation Type: newVersionOf

People who looked at this resource also viewed the following:

Resources from the same creators