ANUSAARAKA: OVERCOMING THE LANGUAGE BARRIER
1  WHAT IS ANUSAARAKA 
2  HOW DOES IT WORK 
3  USER HELP 
4  APPLICATION SCENARIOS 
5  CURRENT STATUS 
6  YOU CAN HELP - INTERNET ACCESS (E-MAIL SERVER) 
7  CONTACT ADDRESS 

OUTLINE
  - What is anusaaraka
  - How does it work
  - User help
  - Application scenarios
  - Current status
  - Internet access
  - Contact address



1 WHAT IS ANUSAARAKA ([2] [UP])

Anusaaraka is a computer software which renders text from one Indian
language into another. It produces output which is comprehensible to
the reader, although at times it might not be grammatical. For example,
a Telugu to Hindi anusaaraka can take a Telugu text and produce output
in Hindi which can be understood by a Hindi reader, but which is not
fully grammatical. Therefore, the reader will require some amount of
training for reading the output.

It is widely realized that machine translation systems are extremely
difficult to build as they require encyclopaedic knowledge to be
put in the machine. A text leaves many things unsaid, and it is
extremely difficult for the machine to infer them reliably, in all
contexts. Therefore, the work load has to be shared between man
and machine.

Anusaaraka is the result of a new way of looking at the problem.
Instead of considering the problem as that of translation, it
views the problem as: overcoming the language barrier. Thus,
the task is to allow a reader to access information from another
language. Anusaaraka analyzes the source language text and presents
exactly the same information in a language close to the target
language. It does not try to "guess" using world knowledge etc.,
because such inferences often go wrong. Instead, it tries to
preserve information from the input to the output text. For this
task, grammaticality is relaxed and special purpose notation is
devised wherever necessary.


2 HOW DOES IT WORK ([3] [1] [UP])

At the heart of anusaaraka is the concept of information and its
preservation. This concept has been adapted from Paninian grammar.
Anusaaraka operates on the source language text, morpheme by
morpheme, and presents equivalent morphemes in the target language.
Since among the Indian languages, the word order rules are basically
the same, this works very well. For a construction in the source
language for which there is no equivalent construction in the
target language, a close equivalent is chosen and augmented by
additional notation. A major example of this is the conversion of
adjectival participles in the south Indian languages to relative
clause construction augmented with '*' in Hindi. (For building an
anusaaraka system from English to an Indian language, some amount
of reordering of morphemes and other units may also be needed.)


3 USER HELP ([4] [2] [UP])

In case, a user is reading the output, his major concern is to
understand the material. Help is provided for this purpose. It might
not be necessary to produce a grammatically correct and stylistically
better output. However, when a document is going to be distributed
in large numbers, it would normally be post-edited by a person
before distribution or publication. A post-editing interface helps
in this task. Similarly, the input text can also be pre-edited if it
contains non-standard spellings, sandhi, non-standard usages, etc.
Such interfaces are being developed further.


4 APPLICATION SCENARIOS ([5] [3] [UP])

Anusaaraka can be used in various scenarios. For example, A reader
might be accessing a web site containing Indian language texts. He
comes across a site of interest, and wants to read material on
it. However, he does not know the language. He can run anusaaraka
and read the text. Normally, the reader motivation is high and he
is willing to put in some effort.

In another scenario, an editor of a magazine is looking for essays
and articles to publish. He uses anusaaraka to get a gist of the
relevant material in other languages. Once he short lists an article
it can be run through the anusaaraka and post-edited by trained
post-editors for publication.

In another situation, a scholar might want to find out about what
an original work or epic actually says, where the original is in
a language which he does not know. Translation is available, but
he wants to see for himself as to what the epic says and what the
translator has interpreted. He can read the epic directly through
the anusaaraka. As the machine does not interpret, and presents an
image of the contents, he is able to see the original without the
translator's interpretation.

Thus, anusaaraka can be used by a reader to read and understand a text
in another Indian language immediately without waiting for human
help to be available. The output can also be grammatically corrected
and made stylistically more acceptable by human post-editors.


5 CURRENT STATUS ([6] [4] [UP])

Anusaarakas have been built from Telugu, Kannada, Bengali, Marathi,
and Punjabi to Hindi. Alpha versions of all of these have been released
so that their field testing can be carried out. The beta-version is
expected to be released soon.


6 YOU CAN HELP - INTERNET ACCESS (E-MAIL SERVER) ([7] [5] [UP])

We welcome interested individuals to participate in its development,
by contributing their effort. For this we have adopted the "free"
software model. The systems with complete source code and
language data are available as "free" software. 

1. An e-mail server has been established for the anusaarakas. It
currently holds alpha-versions of anusaarakas from Telugu,
Kannada, Marathi, Bengali, and Punjabi into Hindi. The purpose of
the alpha-version is to invite people to join-in and work for its
improvement. You can run it on your text and give feedback regarding
problems. To run the anusaaraka on a given text, send the text by
e-mail to:
   nandi@iiit.net	       OR
   nandi@anu.tdil.gov.in       OR
   nandi@anu.uohyd.ernet.in
with the name of the language in the subject line. For example, if 
you put 'telugu' in the subject line, this will automatically run the
Telugu to Hindi anusaaraka. The output produced will be sent back
to the sender. A copy will be kept by the machine for a later study
of results. The text should be in 7-bit ISCII coding. 

If mail is sent to the above with 'help' in the subject line,
the machine will send back help. In case of any difficulty or any
special requirements, mail can be sent to:
   sangal@iiit.net

2. The anusaarakas are available as free software, and the system with
complete source code can be downloaded using anonymous ftp from:
   VishwaBharat.tdil.gov.in   OR
   anu.uohyd.ernet.in     OR     202.41.85.21

We hope that the source code, and the language data is useful for
other purposes as well. We encourage you to use them freely and build
new applications. In turn, you should provide your applications with
source code to others (as described in the license at the ftp site).



7 CONTACT ADDRESS ([6] [UP])

Akshar Bharati group (Attn: Prof Rajeev Sangal), 
Language Technologies Research Centre
International Institute of Information Technology
Gachibowli, Hyderabad 500 019

Tel: 0(40) 3001967 Ext 144, or 3001412 //UoH lab 3010161,
Fax: +(91) (40) 3001413
email: sangal@iiit.net

   Web: http://202.41.85.21
   ALSO hosted at: http://vishwabharat.tdil.gov.in
               OR  http://www.tdil.gov.in


            ANUSAARAKAS HAVE BEEN JOINTLY DEVELOPED BY: 
            I.I.T. Kanpur and University of Hyderabad
       Project funded by Dept. of Electronics, Govt. of India

       (Now, Satyam Computers and International Institute of Information
        Technology Hyderabad have also joined in the effort.)

                                                      Oct 1999
Anusaaraka Home Page