IIIT-Hyderabad Advanced School on Natural Language Processing
May 26th - June 9th, Hyderabad, India, Summer 2008

 

Projects

  • Parsing

    1. Machine learning on Treebanking

      The project involves exploiting the Hindi Treebank to develop Parsers. The experiments will involve improving already existing algorithm by trying out various features and modelling the treebank to maximize their performance. New algorithms will be explored based on the observations of these experiments.

    2. Shallow Parsing

      Shallow parser for new languages will be intiated. This would require resource building such as POS and chunk annotation. Already existing annotation guidelines for Hindi and Telugu will help in the initial phase, these guidelines will have to be modified based on the new language as the annotation progresses. Some shallow parsing modules will not require the annotated corpus and can be simultaneously built. Other modules which presume an annotated corpus can be built once we have a sample of annotated corpus.
      Initial Reading: http://ltrc.iiit.ac.in/nlpai_contest06/icon2005_tutorial_sangal_sriram.ppt

    3. Constraint Parsing

      The project involves analyzing Hindi sentences to determine linguistic cues which can help us determine dependency relations. The aim of the project is to come up with such robust rules which can automatically mark Paninian dependency relations given a POS/Chunk data. The project will not only help us determine the efficiency of these linguistic rules, but will also compliment other machine learning projects and provide them with informed linguistic features. New approaches will be tired out to improve the preformance of the parser.
      Initial Reading: http://www.iiit.net/techreports/2002_3.pdf

    4. Comparision, Evaluation and Improvement of POS/Chunking/NER tools

      Existing algorithms to POS tagging, chunking and NER will be compared and evaluated. Possible tuning in modelling and experiments with new features will be tried out to improve the existing performance of these tools.
      http://www.iiit.net/techreports/2007_92.pdf

    5. Interannotator agreement of POS and Chunked corpus

      Consistency in the annotated corpus is very crucial in order to successfully learn relavent patterns. The project aims to ascertain the interannotator agreement in POS and chunked corpus. The observations might lead to minor modifications in the present guidelines.

    6. Comparision of various English dependency parsers

      Different english dependency parsers will be evaluated based on certain parameters. One such parameter could be specific linguistic constructions. The observations from these experiments will shed light on the strengths and weaknesses of the parsers selected.

    7. Bi-directional dependency Parser (Hindi, Telugu)

      This project enables the participants to get familiar with the LTAG Spinal and dependency grammar formalisms and also provide a glimpse into statistical parsing. The project has two stages. The first stage involves converting the karaka based dependency treebanks in Hindi and Telugu to LTAG spinal format. In the second stage, the Bidirectional parser is used to learn the LTAG dependencies by training on the converted treebanks. Prior experience in Java programming is welcome.

      -----------------------------------------------------------------------------------------------