KCIS Resources
About the dataset :
The annotation is funded by KCIS, DeiTY, Govt. of India.
Each Downloadable zip file contains 2 folders.
Download Dataset :
To download these datasets kindly click on dataset you want and fill the form given.
Corpus Statistics :
Domain (More information - Language) | No. of Sentences | No. of Tokens | Word frequency list |
---|---|---|---|
Health (Disease - Hindi) | 1.5K | 37K | File |
Tourism ( Hindi ) | 3K | 50K | File |
Bengali | 12.5K | 155K | File |
Kannada | 13.1K | 152K | File |
Malayalam | 14.4K | 168K | File |
Marathi | 15.1K | 217K | File |
Coreference Anaphora Annotated Data (Hindi) | 3.6K | 78K | - |
Coreference Annotated Data (Hindi) | 3.5K | 77.6K | - |