Projects

Modelling and application of linguistic similarity
Computational modeling and processing of tense, aspect, modality and temporal information in natural languages
Aligning parallel corpora
Language and encoding identification
A computational phonetic model for Indian language scripts based on the highly phonetic and well organized nature of Brahmi based scripts. It is being used to build applications like a spell checker, cognate identifier, transliteration tool, etc. for Indian languages.
Building GUI based interfaces for corpora annotation in Java
Building APIs for language resources like dictionaries and corpora
A multi-purpose editor specialized for NLP and Indian languages
Etc.

Note: Most of the above and several others like APIs for N-Gram modelling, corpus compilation, find/replace/extract tools for corpora, file splitter, tree viewer, etc. have been integrated as a small open-source Java based platform for NLP, especially focusing on Indian languages. Parts of Sanchay are already being used by many people for working with South Asian languages. Some others are now also contributing to the development of some Sanchay modules.

The last formal release of Sanchay (version 0.4.1) is available for download here. The latest builds are usually put here. You can also contact me.

Anil Kumar Singh ⇔ अनिल कुमार सिंह

Projects