NoWaC - Norwegian Web as Corpus

NoWaC

NoWaC (Norwegian Web as Corpus) is a large web-based corpus of Bokmål Norwegian currently containing about 700 million tokens.
The corpus has been built by crawling, downloading and processing web documents in the .no top-level internet domain between November 2009 and January 2010. The computational procedure used to collect the NoWaC corpus is largely based on the techniques used to build the corpora published by the WaCky initiative.

NoWaC has been built with permission from the Norwegian Ministry of Culture (Kulturdepartementet), and may only be redistributed in sentence-scrambled version.
Read more about the corpus at: http://www.hf.uio.no/iln/om/organisasjon/tekstlab/prosjekter/nowac/index.html.

You don’t have the permission to edit this resource.
People who looked at this resource also viewed the following: