|
PRE-CONFERENCE
TUTORIAL
|
|
|
|
|
The conference will feature two tutorials on December 19, 2003.
-
Title: Search Engine Technologies
Speaker: Vasudeva Varma, IIIT Hyderabad, India
Half day or Full day: Half day
Abstract:
This tutorial discusses why search has become the single most important
internet technology challenge and talks about the ongoing innovations and
competition among search based corporate houses. It discusses the
infrastructure required for building search engines including web crawling,
indexing and information retrieval and extraction. Tutorial aims at
enabling the audiance with the understanding of the following concepts:
-
Index and crawling based search (first generation search)
-
Directorty based search (yahoo like)
-
Hyper text analysis or algorithmic based search (Google like, covering
topics such as page ranking, latent semantic indexing, document transformation etc)
- Personalized or customizable search techniques.
Finally, the tutorial will discuss the importance of ontology, summarization,
categorization, named entiry extraction technologies in the context of building
search engines.
-
Title: Shakti-Kit : A kit for building a Retargetable Machine Translation System
Akshar Bharati, Rajeev Sangal, Dipti Mishra Sharma, Prashant Reddy,
Bhavani Sankar, Rajni Moona, IIIT Hyderabad, India
Speaker: Rajni Moona, IIIT Hyderabad, India
Half day or Full day: Full day
Abstract:
Shakti Machine Translation Kit is a do-it-yourself kit for building an
MT system from English to your language. It has been created based on Shakti Machine
Translation system being developed at Language Technologies Research Center, IIIT,
Hyderabad. Shakti system is presently working for English-Hindi, English-Telugu and
English-Marathi languages besides a smaller version for English-Amheric(an
Ethiopian language).
The architecture of Shakti is highly modular. The complex problem of MT has been
broken into smaller sub-problems. Every sub-problem is a task which is handled by
an independent module. The modules are put together using a common extensive
representation using trees and features . Modules are pipelined and the output of
the previous module becomes the input of the following module. Since the format is
fixed, any of the module can be unplugged and Shakti will still operate albeit
with a slight degradation. Hence,the system is designed to be developer friendly.
SSF itself is a highly readable transparent format for linguists and computer scientists
alike. Inputs and outputs of all the modules are available for inspection for the
developer. Analysis at every level is represented in a readable SSF so that the
developer can look and analyse where it needs improvement.
In this tutorial, the Shakti machine translation system and the Standard Shakti Format
will be introduced. We will outline the modules of shakti and how to analyse outputs
of various modules. The lexical resources and tools (Wordnet, parsers etc.)
will be touched upon. The interested developers will be guided to be able to develop
a machine translation system from English to their desired Indian language using the
Shakti Kit. The format in which the data of the target language is to be given to
develop an MT system from English to their language will also be discussed in detail.
PANEL DISCUSSION: "Machine Translation: A Road Map to Future".
In addition the annual meeting on NLP Association of India and a meeting
on Indo-French Joint Collaboration will also be held during the conference.
|
|
© LTRC. All rights reserved |