|  | 
		
			| | Machine Translation (from English to Hindi & from Hindi to English)
 The goal of this project is to develop an English-to-Hindi Statistical Machine        Translation System. A medium sized parallel dataset will be provided to       the participants to train their systems. A testing set will be provided using       which the performance of the systems will be measured. Some of the       sub-projects that the participants can pursue are,
 
 
 |  | Statistical Phrase-based Machine Translation
 The goal here will be to tune an existing            Phrase-based machine translation system to the Indian language setting.            Phrase-based systems do not take the morphological richess of Indian            languages and the word order-variations that exist between English and            Indian languages.
 
 In this project, a number of experiments will be conducted to  	   take advantage of the rich morphology of Indian Languages within the 	   the framework of Phrase based machine translation.
 
 Guide:  Srinivas Bangalore (AT&T Research Labs, NJ, USA)
 Mentors:  Sriram Venkatapathy (IIIT-H)
 
 
  	| Team 1 |  | Sachin Anklekar, CDAC-Mumbai,
 Sys-19
 | Sriram Chaudhary, IIIT-Hyderabad,
 Sys-Anu-18
 |  | Niraj Shreshta, Katmandu University,
 Sys-19
 | - |  
  	| Team 2 |  | Vimal, CDAC-Noida,
 Sys-20
 | Sunny Sharma, Delhi University,
 Sys-20
 |  | Tarak Ram, IIIT-Hyderabad,
 Sys-18
 | Bindu Madhavi, University of Hyderabad,
 Sys-18
 |  Resources :
  		Reading Assignments : 52,000 English-Hindi Sentence Pairs (Refined Dataset) 		 400,000 English-Hindi Sentence Pairs (Noisy) 		 MOSES: Open-Source Phrase-based Translation System 		 Morphological Analyzer for English and Hindi 		 POS-taggers for English and Hindi 	  		Additional papers to read : Philipp Koehn, Franz Josef Och, and Daniel Marcu. (2003). Statistical Phrase-Based Translation. HLT/NAACL 2003 		 Philipp Koehn and Hieu Hoang. Factored Translation Models, Conference on Empirical Methods in Natural Language Processing (EMNLP), Prague, Czech Republic, June 2007. 		 Web Tutorial on training MOSES
  Goldwater and D. McCloskey. 2005. Improving statistical MT through morphological analysis. In Proceedings of HLT/EMNLP - 2005.
  		 Och, F. J. (2003). Minimum error rate training for statistical machine translation. In Proceedings of the 41st Annual Meeting of the Association of Computational Linguistics (ACL). Franz Josef Och and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguistics, 30(4):417:450, December. R. Zens and H. Ney. 2004. Improvements in phrase-based 			statistical machine translation. In Proceedings of HLT-NAACL, 			pages 257:264, Boston, MA.  List of Experiments to be performed :
  	  	 * To be finalized 	 |  | Syntax-based Machine Translation
 Syntax-based approaches are well designed to            handle large word-order variations between languages and hence, they           seem more appropriate for developing systems between English and Indian           languages. The goal of this project will be to extend these approaches to            obtain better translational accuracies.
 
 Several experiments will be conducted to evaluate the effectiveness 	of syntax for sentence re-construction.
 
 Guide:  Srinivas Bangalore (AT&T Research Labs, NJ, USA)
 Mentors:  Sriram Venkatapathy (IIIT-H)
 
 
  	| Team 1 |  | Alok Dadhekar, CDAC Mumbai,
 Sys-21
 | Garima Kukreja, Delhi University,
 Sys-21
 |  | Avinesh, IIIT-Hyderabad,
 Sys-23
 | K. Rajyarama, University of Hyd.,
 Sys-16
 |  | Gour Mohan, CDAC-Noida,
 Sys-16
 | - |  
  	| Team 2 |  | Prashanth Mathur, IIIT-Hyderabad,
 Sys-24
 | Kolte Sopan Govind, Bharati Vidhyapeeth, Pune,
 Sys-24
 |  | Kailash Kattalay, Fuji Academy,
 Sys-23
 | Saurabh Kushwaha, CDAC-Mumbai,
 Sys-17
 |  | Anil Kumar, CDAC-Noida,
 Sys-17
 | - |  Resources :
                  Reading Assignments : 52,000 English-Hindi Sentence Pairs (Refined Dataset)                  400,000 English-Hindi Sentence Pairs (Noisy)                  Wide-coverage parser for English 		 Limited coverage parser for Hindi                  Supertagger for English           		Additional papers to read : Yamada, K. and Knight, K. (2001). A syntax-based statistical 		translation model. In Proceedings of the 39th Annual Meeting 		of the Association of Computational Linguistics (ACL). Menezes, A. and Quirk, C. (2005). Microsoft research 		treelet 		translation system: IWSLT evaluation. In Proc. of the International 		Workshop on Spoken Language Translation. Sriram Venkatapathy and Srinivas Bangalore. 2007. Three models for discriminative machine translation using Global Lexical Selection and Sentence Reconstruction. In Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation, Rochester, USA.  		List of Experiments to be performed : Alshawi, H., Bangalore, S., and Douglas, S. (1998). 		Automatic 		acquisition of hierarchical transduction models for machine 		translation. In Proceedings of the 36th Annual Meeting of 		the Association of Computational Linguistics (ACL).Wu, D. (1997). Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 23(3).  	  	 * To be finalized 	 |  | Sentence Construction after Global Lexical Selection
 Global lexical selection            is a technique proposed recently which considers the entire source sentence            while predicting every word in the target language. Then these target language            words are arranged in an appropriate order to obtain a well-formed target sentence.            Global lexical selection has been shown to deliver good lexical selection accuracies.            In this project, the goal will be to develop well-performing sentence construction            algorithms.
 
 Guide:  Srinivas Bangalore (AT&T Research Labs, NJ, USA)
 Mentors:  Sriram Venkatapathy (IIIT-H)
 
 
  	| Team |  | Karthik Gali, IIIT-Hyderabad,
 Sys-26
 | Riya Singh, NIT - Surathkal,
 Sys-26
 |  | Latha Nair, CUSAT, Cochin,
 Sys-25
 | Vipul Mittal, IIIT-Hyderabad,
 Sys-25
 |  Resources :
  		Reading Assignments :                   52,000 English-Hindi Sentence Pairs (Refined Dataset)                  400,000 English-Hindi Sentence Pairs (Noisy) 		 English-French Europarl Corpus                  Maximum Entropy Toolkit           		List of Experiments to be performed : Srinivas Bangalore, Patrick Haffner and  Stephan Kanthak. 2007. Statistical Machine Translation through Global Lexical Selection and Sentence Reconstruction. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic. Sriram Venkatapathy and Srinivas Bangalore. 2007. Three models for discriminative machine translation using Global Lexical Selection and Sentence Reconstruction. In Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation, Rochester, USA. Tutorials on Maximum Entropy Modeling 	    	  	 * To be finalized 	 
 |  | 
 | 
 |  |