IJCNLP 2008

Workshop on NER for South and South East Asian Languages

January 12, 2008, IIIT, Hyderabad, India

Home
IJCNLP Home
Call for Papers
Shared Task
Tagset
Annotation Guidelines
Data
Evaluation
Tools
Proceedings
Registration
Submission
Tutorial
Accepted Papers
Invited Talks
Workshop Programs
Program Committee
Acknowledgements
Flyer
FAQ
Draft Papers
Contact

Evaluation

Evaluation Measures

Precision, recall and F-measure will have to be calculated for two cases: maximal named entities and nested named entities. Thus, there will be six measures of performance:

  • Maximal Precision: Pm = cm/rm
  • Maximal Recall: Rm = cm/tm
  • Maximal F-Measure: Fm = 2 × Pm × Rm / (Pm + Rm)
  • Nested Precision: Pn = cn/rn
  • Nested Recall: Rn = cn/tn
  • Nested F-Measure: Fn = 2 × Pn × Rn / (Pn + Rn)

where c is the number of correctly retrieved (identified) named entities, r is the total number of named entities retrieved by your system (correct plus incorrect) and t is the total number of named entities in the test data.

Then there will be three cases for each of these six measures: boundary identification, labelling, and boundary identification plus labelling. Therefore, the participants will have to report at least eighteen performance values.

Automatic Evaluation

Evaluation will be automatic and will be against the manually prepared test data given to you. An evaluation script for this purpose is available as a zip file and as a tar file. This scripts assumes that there is a single test and reference file, the number and order of sentences is the same in both, and that tokenization (number and order of words) has not been changed by the NER system.

The format accepted by the evaluation script is the same as given in the tutorial.

Locations of visitors to this page