KCIS Resources

About the dataset : The annotation is funded by KCIS, DeiTY, Govt. of India.
Each Downloadable zip file contains 2 folders.

  1. Documents
  2. Data
The dependency annotation follows Paninian Grammar Framework (Guidelines in Documents). The mapping of the dependency labels with the stanford dependency labels are also in the Documents. The annotation follows SSF (Shakti Standard Format), for further details related to SSF, refer the SSF_Guide.

Download Dataset :
To download these datasets kindly click on dataset you want and fill the form given.

Corpus Statistics :

Domain (More information - Language) No. of Sentences No. of Tokens Word frequency list
Health (Disease - Hindi) 1.5K 37K File
Tourism ( Hindi ) 3K 50K File
Bengali 12.5K 155K File
Kannada 13.1K 152K File
Malayalam 14.4K 168K File
Marathi 15.1K 217K File
Coreference Anaphora Annotated Data (Hindi) 3.6K 78K -
Coreference Annotated Data (Hindi) 3.5K 77.6K -