IIIT-Hyderabad Advanced School on Natural Language Processing
May 26th - June 9th, Hyderabad, India, Summer 2008

 

Projects

  • Machine Translation

    1. Extraction and Evaluation of Basic transfer grammar from word-aligned corpus (words and local word groups)

      Transfer grammar is used to transfer from the source syntactic structure to the target syntactic structure. This project involves extracting transfer grammar from the word-aligned corpus. For obtaining the syntactic structures in source, syntactic parser of AnalGen 0.83 will be used. Later, the extracted transfer will be used by AnalGen 0.83 to translate sentences. Evaluation will be performed to check the quality of translations. The earlier version of the IIIT translation system - Shakti - can be accessed at :
      Shakti

    2. Creating parallel dependency treebank

      This project involves creating a parallel dependency treebank. The parallel treebank consists of the following:

      1. Sentence aligned Parallel corpus - Sentences in source language have corresponding sentences in the target language.
      2. Syntactic structures for sentences in both the texts built using the same grammar formalism.

      The plan of the project will be to
      1. Explore dependency parser for source (Stanford parser + Paninian labeler)
      2. Explore dependency parser for target
      3. Visualize parses in both the languages and make corrections.

      This project will also involve tools that will aid the development of the corpus. This corpus will be extremely useful to learn transfer grammar which is much needed for a Machine Translation system.
      http://ufal.mff.cuni.cz/pcedt/doc/papers/ijcnlp2004.pdf

    3. Hierarchical Language Modeling

      This project involves generating a well-formed sentence from a bag of words. Language modeling techniques will be used to obtain the best target language sentence. Apart from the language models, dependency links among the words will also be provided. These links will be used as constraints while generating the order of words in the bag. - Read language modeling in book on NLP by Jurafsky and Martin.

    4. Generation using Supertags

      The project involves generation of a well-formed sentence from a bag of words. The words in the bag will be associated with supertags. Supertags represent the syntactic properties of words. They are linguistically much more loaded when compared to part-of-speech tags. For a verb, they contain informatin such as valency (ie..,transitive/intransitive etc.). Supertags will be used to help in the reconstruction of the sentence using the words in a bag.

      - http://citeseer.ist.psu.edu/bangalore99supertagging.html
      - http://www.research.att.com/~srini/Papers/Parsing/coling2002.ps
      - http://www.cis.upenn.edu/~xtag/

    5. Compute Relative Compositionality of Multi-word Expressions from a large corpus.

      This project involves computing the relative composition values of multi-word expressions from a large corpus. There are various measures defined to compute the compositionality of multi-word expressions. We will be working with some of these measures.
      - http://www.cis.upenn.edu/~sriramv/emnlp2005_sriram_joshi.pdf

      -----------------------------------------------------------------------------------------------