IJCNLP 2008

Workshop on NER for South and South East Asian Languages

January 12, 2008, IIIT, Hyderabad, India

Home
IJCNLP Home
Call for Papers
Shared Task
Tagset
Annotation Guidelines
Data
Evaluation
Tools
Proceedings
Registration
Submission
Tutorial
Accepted Papers
Invited Talks
Workshop Programs
Program Committee
Acknowledgements
Flyer
FAQ
Draft Papers
Contact

Shared Task

In the shared task, the contestants having their own NER systems will be given some annotated test data. The participating systems will be ranked according to their performance on the test data. Training data is being provided for five languages (see below). The contestants will have the freedom to use any technique for NER, e.g. a purely rule based technique or a purely statistical technique.

Please note that contestants may build NER systems targeted for a specific language, but they will have to report results for their systems on all the languages for which training data has been provided. This condition is meant to provide fair grounds for comparison of systems, since the amount of training data is different for different languages.

Data is being provided for Hindi, Bengali, Oriya, Telugu and Urdu. The data released for the shared task will be made accessible to all for non-profit research word, not just for the participants.

The task in this contest will be different in one important way. The NER systems also have to identify nested named entities. For example, in the sentence The Lal Bahadur Shastri National Academy of Administration is located in Mussoorie, 'Lal Bahadur Shastri' is a Person, but 'Lal Bahadur Shastri National Academy of Administration' is an Organization. In this case, the NER systems will have to identify both 'Person' and 'Organization' in the given sentence.

If you are new to shared tasks, then this write up could be useful.

Locations of visitors to this page