@MISC{Oostdijk99buildinga, author = {Nelleke Oostdijk}, title = {Building a Corpus of Spoken Dutch}, year = {1999} }
Share
OpenURL
Abstract
In this paper the Spoken Dutch Corpus Project is presented, a joint Flemish-Dutch undertaking aimed at the compilation and annotation of a 10million -word corpus of spoken Dutch. * Upon completion, the corpus will constitute a valuable resource for research in the fields of computational linguistics and language and speech technology. The paper first gives an overview of the project. It then goes on to describe the data that are available in the first release of the first part of the corpus that came out March 1st, 2000. 1 Introduction In June 1998 the Spoken Dutch Corpus project was started, a five-year project aimed at the compilation and annotation of a 10-million-word corpus of contemporary standard Dutch as spoken in the Netherlands and Flanders. The project is funded jointly by the Flemish and Dutch governments and Science Foundations with a budget of some 4.6 MEuro. The entire corpus will be orthographically transcribed, lemmatized and annotated with part-of-speech informati...