
| Home |
| Proceedings |
Machine translation involves automatic translation of sentences in one language to another. The statistical approaches for doing machine translation have proved effective in recent years, specially for translating between European languages. However, its effectiveness in translating sentences from English to Indian languages needs to be explored further.
This shared task aims to collectively explore a variety of ways of combining statistical techniques with linguistic inputs to improve a baseline statistical machine translation system from English to Hindi.
Contest
In the contest, training data will be provided to the contestants. It will consist of English-Hindi parallel corpus. The contestants will have to train their systems on the data. A development corpus will also be provided to refine and improve their system. The final contest will be held on November, 2008 with the test data. A workshop will be held as a part of ICON to allow the short listed candidates to present their techniques and results.
The contestants are free to use any of the following:
| Linguistic tools | - Not trained on test/devel set . |
| Linguistic Resources | - Not extracted from test/devel set . |
| Any annotated data | - Data not from test/devel set . |
However, any of the above components used by the participants in this contest have to be shared with all the other participants.
Resources:
* English-Hindi parallel corpus.
** Resources shared by participants of the contest.