{"id":813187,"projects":[225],"description":"The SoNaR corpus is a large Dutch text corpus developed for linguistic research. It consists of two main parts: SoNaR-500, with over 500 million words from a wide variety of domains and genres, and SoNaR-1, a manually verified 1-million-word subset with extensive semantic annotations. The corpus includes automatic and manual annotations such as tokenization, part-of-speech tagging, lemmatization, named entity recognition, coreference annotation, and annotation of spatial and temporal relations. SoNaR supports research in computational linguistics, language modeling, and natural language processing, and is maintained by the Dutch Language Institute (INT).","image":"N/A","tags":"ssh , dutch , linguistic, corpus","type":"","title":"sonar","url":"https://taalmaterialen.ivdnt.org/download/tstc-sonar-corpus/","authors":[1609],"rubrics":[25]}