sonar
The SoNaR corpus is a large Dutch text corpus developed for linguistic research. It consists of two main parts: SoNaR-500, with over 500 million words from a wide variety of domains and genres, and SoNaR-1, a manually verified 1-million-word subset with extensive semantic annotations. The corpus includes automatic and manual annotations such as tokenization, part-of-speech tagging, lemmatization, named entity recognition, coreference annotation, and annotation of spatial and temporal relations. SoNaR supports research in computational linguistics, language modeling, and natural language processing, and is maintained by the Dutch Language Institute (INT).
Tags: ssh dutch linguistic corpus
URL(s):
View AssociationsDigital Object Assessments (1)
| Assessment | Metrics | Date | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Rubric | Project | BeCos ineo | Globally unique identifier | Persistent identifier | Machine-readable metadata | Standardized metadata | Resource identifier in metadata | Resource discovery through web search | Open, Free, Standardized Access protocol | Protocol to access restricted content | Persistence of resource and metadata | Resource uses formal language | FAIR vocabulary | Linked | Digital resource license | Metadata license | Provenance scheme | Certificate of compliance to community standard | ||
| FAIR metrics by fairmetrics.org | becos eval ineo |
|
yes (1.00) | yes (1.00) | no (0.00) | yes (1.00) | no (0.00) | yes (1.00) | yes (1.00) | yes (1.00) | no (0.00) | no (0.00) | no (0.00) | no (0.00) | no (0.00) | yes (1.00) | no (0.00) | no (0.00) | no (0.00) | Jun 23, 2025 |