IJCNLP 2008

In the shared task, the contestants having their own NER systems will be given some annotated test data. The participating systems will be ranked according to their performance on the test data. Training data is being provided for five languages (see below). The contestants will have the freedom to use any technique for NER, e.g. a purely rule based technique or a purely statistical technique.

Please note that contestants may build NER systems targeted for a specific language, but they will have to report results for their systems on all the languages for which training data has been provided. This condition is meant to provide fair grounds for comparison of systems, since the amount of training data is different for different languages.

Data is being provided for Hindi, Bengali, Oriya, Telugu and Urdu. The data released for the shared task will be made accessible to all for non-profit research word, not just for the participants.

The task in this contest will be different in one important way. The NER systems also have to identify nested named entities. For example, in the sentence The Lal Bahadur Shastri National Academy of Administration is located in Mussoorie, 'Lal Bahadur Shastri' is a Person, but 'Lal Bahadur Shastri National Academy of Administration' is an Organization. In this case, the NER systems will have to identify both 'Person' and 'Organization' in the given sentence.

If you are new to shared tasks, then this write up could be useful.