KCIS Resources
About the dataset :
The annotation is funded by KCIS, DeiTY, Govt. of India.
Each Downloadable zip file contains 2 folders.
Download Dataset :
To download these datasets kindly click on dataset you want and fill the form given.
Corpus Statistics :
| Domain (More information - Language) | No. of Sentences | No. of Tokens | Word frequency list |
|---|---|---|---|
| Health (Disease - Hindi) | 1.5K | 37K | File |
| Tourism ( Hindi ) | 3K | 50K | File |
| Bengali | 12.5K | 155K | File |
| Kannada | 13.1K | 152K | File |
| Malayalam | 14.4K | 168K | File |
| Marathi | 15.1K | 217K | File |
| Coreference Anaphora Annotated Data (Hindi) | 3.6K | 78K | - |
| Coreference Annotated Data (Hindi) | 3.5K | 77.6K | - |