IJCNLP 2008

Workshop on NLP for Less Privileged Languages

January 11, 2008, IIIT, Hyderabad, India

Home

IJCNLP Home

Announcement

IJCNLP-08 Workshop on NLP for Less Privileged Languages

While computing has become almost ubiquitous in the US and Europe, its spread in Asia is more recent. However, despite the fact that Asia is a dense area in terms of linguistic diversity (or perhaps because of it), many Asian languages are very inadequately supported on computers. Even basic NLP tools are not available for these languages. This is a major bottleneck in the development of advanced NLP applications and language resources and it also has a social cost.

NLP/CL based technologies are now becoming important and future intelligent systems will use more of these techniques. Most of NLP/CL tools and technologies are tailored for English or European languages. Recently, there has been a rapid growth of IT industry in many Asian countries and in India in particular. This is now the perfect time to address the problem mentioned above, namely lack of computing support and basic NLP tools for less privileged languages. Only when a basic infrastructure for supporting regional languages becomes available can we hope for a more equitable availability of opportunities made possible by language technology. There have already been attempts in this direction (some of them are mentioned below) and this workshop will try to take them further, especially in the Asian context.

Topics to be covered:

Archiving and creation of interoperable data and metadata for less privileged languages
Support for less privileged language on computers. This includes input methods, display, fonts, encoding converters, spell checkers, more linguistically aware text editors etc.
Basic NLP tools such as sentence marker, tokenizer, morphological analyzer, transliteration tools, language and encoding identifiers etc.
Advanced NLP tools such as POS taggers, local word grouper, approximate string search, tools for making development of language resources easier.

Target Audience

Language resource developers and researchers working in Natural Language Processing and Computational Linguistics, and also those who are involved in providing computing support for less privileged languages whether they have official status or otherwise as in the case of some of the tribal languages. The workshop is open to any less privileged language of the world.

Background

Workshop on "Shallow Parsing in South Asian Languages", IJCAI-07, Hyderabad, India
EMELD and the Digital Tools Summit in Linguistics, Michigan State University, East Lansing, Michigan, June 22-23, 2006
First Steps for Language Documentation of Minority Languages: Computational Linguistic Tools for Morphology, Lexicon and Corpus Compilation, LREC2004, Lisbon, Portugal
Workshop on Language Resources for European Minority Languages, Granada, Spain, May 27, 1998
Strategies for developing machine translation for minority languages, 5th SALTMIL Workshop on Minority Languages on Tuesday May 23, 2006. Organized in conjunction with LREC 2006. Also other workshops LREC conferences .

Projects supported by ELRA on the Basic Language Resource Kit (BLARK) that targets the specifications of a minimal kits for each language to support NLP tools development (more details at : http://www.elda.org/blark/)
There is also a corresponding project at LDC (the Less Commonly Taught Languages: http://projects.ldc.upenn.edu/LCTL/)

This workshop will also be relevant in the context of a linguistic survey of India which is being planned as a major project for the next few years. Once a lot of data about less privileged languages is collected, tools will be needed to process this data.

Organizer

Anil Kumar Singh
anil@research.iiit.ac.in
Language Technologies Research Centre
International Institute of Information Technology
Hyderabad, India