|
Morphological Analysis
Morphological analysis is an important step while processing Indian languages. In this project, the goal will to develop and test morphological analyzers for Indian languages. A range of techniques will be tried to develop both rule-based and unsupervised analyzers.
Comparison of an Unsupervised Morph Analyzer with a Rule based Morph Analyzer The goal of this project is to build an unsupervised Morphological Analyzer and compare its output with the analysis produced by a rule-based morph analyzer. Various aspects would be dealt such as accuracy and coverage.
Guide: Srinivas Bangalore (AT&T Research Labs) Mentors: Sriram Venkatapathy (IIIT-H)
Team | Parminder Singh, Gurunanak Univ., Punjab | N Kalyani, G Narayanamma Institute, AnuSys-11 | Ankur Garg, CDAC-Noida, AnuSys-12 | K V N Sunita, G Narayanamma Institute, AnuSys-11 | Team | Balaram Prasain, Tribhuvan University, AnuSys-7 | Rajeev R R, Tamil University | Asanka Wasala, University of Colombo, AnuSys-8 | Pramod Gupta, CDAC-Noida, AnuSys-7 | Resources : - An open-source rule-based Morphological Analyzer (follows Paradigm approach)
- 1.2 million words clean CIIL Hindi Corpus
Reading Assignments: - Utpal Sharma, Jugal Kalita and Rajib Das. 2002. Unsupervised Learning of Morphology for Building Lexicon. for a Highly Inflectional Language. In Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning.
- Yu Hu, Irina Matveeva, John Goldsmith and Colin Sprague. 2005. Using Morphology and Syntax Together in Unsupervised Learning. In Proceedings of the Workshop on Psychocomputational Models of Human Language Acquisition.
- Poor Man's Stemming: Unsupervised Recognition of Same-Stem Words.
Additional papers to read: - Akshar Bharati, Vineet Chaitanya, and Rajeev Sangal. Natural Language Processing: A Paninian Perspective
List of Experiments to be performed : - * To be finalized
| Semi-supervised Morphological Analysis by using a Rule-based system as a seed. The goal of this project is to improve a Hindi rule-based morphological analyzer using a raw corpus. Rule-based morphological analyzers have a fairly low coverage. We would be trying to improve its coverage using a large Hindi corpus.
Guide: Srinivas Bangalore (AT&T Research Labs) Mentors: Sriram Venkatapathy (IIIT-H)
Team 1 | Viraj Welgama, University of Colombo, AnuSys-22 | Prateek Bhatia, Thapar University, Patiala. | - | Vasudevan, IIT-Bombay, AnuSys-21 | Team 2 | Vishal Goyal, Punjab Univ., AnuSys-19 | - | D V Sriram, IIIT-Hyderabad, AnuSys-16 | Krishna Kumar, Tamil University | Resources: - A rule-based morphological analyzer for Hindi.
- 1.2 million words clean CIIL Hindi Corpus
Reading Assignment: - Akshar Bharati, Vineet Chaitanya, and Rajeev Sangal. Natural Language Processing: A Paninian Perspective
- Akshar Bharati, Rajeev Sangal, Sushma Bendre, Pavan Kumar, Aishwarya. Unsupervised Improvement of Morphological Analyzer for Inflectionally Rich Languages.
Additional papers to read : - Utpal Sharma, Jugal Kalita and Rajib Das. 2002. Unsupervised Learning of Morphology for Building Lexicon. for a Highly Inflectional Language. In Proceedings of the ACL-02 Workshop on Morphological and Phonological Learning.
| Comparison of FST tools for Morphological Analysis Guide: Amba Kulkarni (University of Hyd) Mentors:
Team | Ashwini Vaidya, IIIT-Hyderabad, Sys-12 | Renjini Narendranath, IIIT-Hyderabad, Sys-12 | Gowri Dev, IIIT-Hyderabad, Sys-15 | Thennarasu Sakkan, University of Hyderabad, Sys-15 | Resources: Reading Assignments: - Akshar Bharati, Vineet Chaitanya, and Rajeev Sangal. Natural Language Processing: A Paninian Perspective (Chapter on Morphological Analysis)
- APERTIUM Documentation
- FLAN Documentation
List of Experiments to be performed : - Run FLAN as well as Apertium for the existing paradigms
- Compare the performance of original morph, FLAN and Apertium based on following parameters:
- performance on random texts
How easy/difficult to adapt the FSTs for handling 'vowel harmony' (as in Telugu/Marathi), and derivational morphology (Telugu). | | | | | |
|