TCS NLP Winter School 2008

24 December, 2007 - 7 January, 2008

Collocated with IJCNLP 2008 at IIIT, Hyderabad, India


Home
IJCNLP Home
To Apply
Fees
Schedule of Lectures
List of Projects
Resources
Sponsors
Organizers
Contact
Important local info
About Hyderabad
Venue Maps


Dependency Parsing


Unsupervised Dependency Parsing

The goal of this project is to develop a large unsupervised dependency parser for Hindi. The participants will be provided a large Hindi language corpus and the goal is to learn the likelihood of a dependency link existing between any two words in a given context in the sentence. The likelihood is then used by a parsing algorithm to parse new sentences.

Guide: Rajeev Sangal (IIIT-H)
Mentors: Karthik Gali (IIIT-H), Jagadeesh Gorla (Wisdom Tap, Bengaluru)

Team
Prajwal Rupakheti,
Madan Puraskar Pustakalay,
Katmandu,
Sys-1
Pradeep Dasigi,
VIT,
Sys-1
Balaji L.,
CEG, Anna University,
Sys-2
Bhuvaneshwari,
University of Hyderabad,
Sys-2

Resources:
  • 1.2 million words clean CIIL Hindi Corpus
  • POS tagger for Hindi
Reading Assignments:
  1. Klein, D. 2004. The Unsupervised Learning of Natural Language Structure. Ph.D. Thesis. Stanford University.
  2. Jianfeng Gao, Hisami Suzuki. 2003. Unsupervised Learning of DepeNdency Structure for Language Modeling. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan
  3. Rens Bod. 2006. An All-Subtrees Approach to Unsupervised Parsing. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics. Sydney.
List of Experiments to be performed :
  1. * To be finalized
 

Training Paninian parser using a treebank

The goal here is to train a paninian parser using a dependency treebank. This parser will be designed not just to infer the dependency links between words but also label these links with the karaka labels. The karaka scheme was designed to explain the grammar of sanskrit. This scheme also holds for most modern Indian Languages.

One of the approaches which will be tried in the school is 'Integer programming based parser'. This parser takes a set of constraints to parse new sentences in Hindi. The weights associated with the constraints can be learnt using a supervised learning model.

Guide: Rajeev Sangal (IIIT-H), Dipti Misra (IIIT-H), Samar Husain (IIIT-H)

Team 1
Srinivas Medimi,
IIT-Bombay,
Sys-3
Prof. Veeranna S. Wadi,
Gulbarga Univ.
Sambhav Jain,
IIIT-Hyderabad,
Sys-3
-

Team 2
Vijayakrishna,
AU-KBC, Chennai,
Sys-4
Pranava Swaroop,
NIT, Jaipur,
Sys-4
Bharat Ram,
IIIT-Hyderabad,
Sys-6
-

Resources:
  • Hindi Dependency Treebank containing 3000 sentences.
  • Publicly available code of Ryan McDonald for Unlabelled Dependency Parsing.
Reading Assignments:
  1. Rafiya Begum, Samar Husain, Arun Dhwaj, Dipti Misra Sharma, Lakshmi Bai, Rajeev Sangal. Dependency Annotation Scheme for Indian Languages . In Proceedings of The Third International Joint Conference on Natural Language Processing (IJCNLP). Hyderabad, India. 2008.
  2. Ryan McDonald. 2006. Discriminative Training and Spanning Tree Algorithms for Dependency Parsing. Ph.D. Thesis. University of Pennsylvania.
  3. Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kü S., Marinov, S. and Marsi, E. (2007) MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95-135.
List of Experiments to be performed :
  1. * To be finalized

 

Semi-supervised verb frame learning given seed frames

Verb frames define the argument structures of verbs. This resource is useful to develop parsers for any language. In this project, the goal is to learn the verb frames in a semi-supervised fashion by using a set of seed frames.

Guide: Rajeev Sangal (IIIT-H), Dipti Misra (IIIT-H), Samar Husain (IIIT-H)

Team 1
Ram Raj Lohani,
Tribhuvan University,
Sys-7
K S Anish Shankar,
NIT-Jaipur,
Sys-7
Harsh Vardhan,
IIIT-Hyderabad,
Sys-8
Ganeshwar Rao,
IIIT-Hyderabad,
Sys-8

Team 2
Kiran Kumar,
Fuji Academy,
Sys-9
Ananth Ramakrishnan,
AU-KBC,Chennai,
Sys-9
G.V.Sivakumar Reddy, IIIT-Hyderabad,
Sys-13
Abhilash I, IIIT-Hyderabad,
Sys-13

Resources :
  • 1.2 million words clean CIIL Hindi Corpus
  • Verb frames covering 200 verbs
  • Morphological Analyzer for Hindi.
Reading Assignments :
  1. Rafiya Begum, Samar Husain, Arun Dhwaj, Dipti Misra Sharma, Lakshmi Bai, Rajeev Sangal. Dependency Annotation Scheme for Indian Languages. In Proceedings of The Third International Joint Conference on Natural Language Processing (IJCNLP). Hyderabad, India. 2008.
  2. Hoa Trang Dang, Karin Kipper, Martha Palmer. Integrating compositional semantics into a verb lexicon. COLING-2000 Eighteenth International Conference on Computational Linguistics, Germany
  3. Sabine Schulte im Walde. Experiments on the Choice of Features for Learning Verb Classes. Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics. 2003. Budapest, Hungary
Additional papers to read :
  1. Framenet
  2. Wordnet
List of Experiments to be performed :
  1. * To be finalized

Dependency Relations with Rules

The project involves analyzing Hindi sentences to determine linguistic cues which can help us determine dependency relations. The aim of the project is to come up with such robust rules which can automatically mark Paninian dependency relations given a POS/Chunk data. The project also presumes the input sentences to be marked with morphological information.
The project will not only help us determine the efficiency of these linguistic rules, but will also compliment other machine learning projects and provide them with informed linguistic features.

Guide: Dipti Misra (IIIT-H), Rajeev Sangal (IIIT-H)
Mentors: Rafiya Begum (IIIT-H), Samar Husain (IIIT-H)

Team
Vineet Yadav,
IIIT-Hyderabad
Geeta Katkar,
IIIT-Hyderabad
Dilip Singh,
IIIT-Hyderabad
Sapna,
IIIT-Hyderabad
Itishree Jena,
IIIT-Hyderabad
Y Vishwanatha Naidu,
IIIT-Hyderabad
--

Resources:
  • -
  • -
Reading Assignments:
  1. -
  2. -
List of Experiments to be performed :
  1. * To be finalized
 

Constraint based Dependency Parser

The aim of the project is to bootstrap a Telugu Parser using the existing machinery of a constraint based Hindi Parser. The project will involve building Telugu verb frames and to adapt the existing parser to handle basic Telugu intra-causal relations.

Guide: Rajeev Sangal (IIIT-H), Dipti Misra (IIIT-H), Samar Husain (IIIT-H)

Team
SRP Chaitanya,
IIIT-Hyderabad,
Sys-11
N.S.Chandra Prasad,
IIIT-Hyderabad,
Sys-10
Ravi Kiran,
IIIT-Hyderbad,
Sys-10
Bindu Madhavi,
University of Hyderabad

Resources:
  • Constraint Based Hindi Parser
  • Telugu Morph Analyzer, Tagger and Chunker
Reading Assignments:
  1. Akshar Bharati, Rajeev Sangal, T Papi Reddy. 2002. A Constraint Based Parser Using Integer Programming In Proc. of ICON-2002: International Conference on Natural Language Processing
  2. -
List of Experiments to be performed :
  1. * To be finalized