NoWaC

19 Last view: 2026-03-12

5 Last download: 2022-06-19

NoWaC - Norwegian Web as Corpus

http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html

NoWaC (Norwegian Web as Corpus) is a large web-based corpus of Bokmål Norwegian currently containing about 700 million tokens.
The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. The computational procedure used to collect the NoWaC corpus is largely based on the techniques used to build the corpora published by the WaCky initiative.

NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet), and may only be redistributed in sentence-scrambled version.
Read more about the corpus at: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html.

You don’t have the permission to edit this resource.

DistributionDOI

10.15155/9-00-0000-0000-0000-0017DL

Availability

Available - Restricted Use

Licence

CC - BY - NC - SA

Restrictions: Academic - Non Commercial Use, Attribution, Other, Share Alike

User Nature: Academic

Download location: hidden

Distribution Access/Medium: Accessible Through Interface, Downloadable

Execution location: hidden

text

Monolingual text corpusLanguages

Norwegian Bokmål

Linguality

Linguality type: Monolingual

Size

700 million Tokens

Metadata

Created: 06/13/2012

Last Updated: 06/13/2012

Source: OLAC

Metadata Language: English (en)

Version

Version: 0.1

Revision: This is the first version of NoWaC (Norwegian Web as Corpus), a large web-based corpus of Bokmål Norwegian currently containing about 700 million tokens. Read more at: https://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html

People who looked at this resource also viewed the following: