ANUSAARAKA SYSTEM

1. ANUSAARAKA APPROACH
2. APPLICATIONS
Machine translation systems are extremely difficult to build. Translation is a creative process in which the translator has to interpret the text, something which is very hard for the machine to do. In spite of the difficulty of MT, the anusaaraka can be used to overcome the language barrier in India today. Anusaaraka systems among Indian languages are designed by noting the following two key features:
1. In the anusaaraka approach, the load between the reader and the machine is divided in such a way that the aspects which are difficult for the reader are handled by the machine, and aspects which are easy for the reader are left to him. Specifically, reader would have difficulty learning the vocabulary of the language, while he would be good at using general background knowledge needed to interpret any text. On the other hand, the machine is good at "memorising" an entire dictionary, grammar rules, etc. but poor at using background knowledge. Thus, the work is divided, in which the language-based analysis of the text is carried out by the machine, and knowledge-based analysis or interpretation is left to the reader.

2. Among Indian languages, which share vocabulary, grammar, pragmatics, etc. the task is easier. For example, in general, the words in a language are ambiguous, but if the languages are close to each other, one is likely to find a one to one correspondence between words where the meaning is carried across from source language to target language. For example, for 80 percent of the Kannada words in the current anusaaraka dictionary of 30000 root words, there is a single equivalent Hindi word which covers the senses of the original Kannada word.

In the anusaaraka approach, the reader is given an image of the source text in the target language by faithfully representing whatever is actually contained in the source language text. So the task boils down to presenting the information to the user in an appropriate form. We relax the requirement that the output in the target language should be grammatical. The emphasis shifts to comprehensibility. The answer is to deviate from the target language in a systematic manner.

First, new notation is invented and incorporated. For example, Hindi has the post-position marker 'ko', which functions both as accusative marker as well as dative marker. We distinguish between them by putting a diacritic mark (backquote). Thus, existing words in the target language may be given wider or narrower meaning.

Second, we may relax some of the conditions in the target language. For example, we give up agreement in our "dialect" of the target language. The principle behind the systematic deviations is simple: the output follows the grammar of the source language. In the case of agreement, to state it more precisely, the output follows the agreement rules of the source language, therefore, the output in the target language appears to be without agreement. Some of the constructions of the source language may also get introduced in the target language. (Actually, as the constructions are largely common across the two languages, a new construction is noticed only when the source language has a construction which is somewhat different from the target language.)

Sometimes, language bridges might be built between constructions in the source language which are not there in the target language. A different construction but which can express the same information in the target language is chosen, with some additional notation, if necessary. For example, adjectival participial phrases in the South Indian languages are mapped to relative clauses with the 'jo*' notation.

Because of the reasons mentioned above, some amount of training will be needed on the part of the reader to read and understand the output. This training will include teaching of notation, some saliant features of the source language, and is likely to be about 10% of the time needed to learn a new language. For example, among Indian languages it could be of a few weeks duration, depending on the proficiency desired. It could also occur informally as the reader uses the system and reads its output, so the formal training could be small.

2 APPLICATIONS TOP

Anusaaraka can be used in a variety of situations. Here we give some examples:

1. A reader wants to read an e-mail message or a document quickly, to find out its gross contents.

The reader can run anusaaraka on the source and read the output directly. He might not be proficient in the use of anusaaraka, but since the reader motivation is high, he might be willing to put in the effort using the online help.

2. A publisher wants to translate a literary work and publish it.

The anusaaraka output will have to be post-edited by a person, to make it grammatically correct, stylistically proper, etc. The post-edited output can be published. (In fact, the anusaaraka group is planning to bring out two books by well-known Kannada authors, which have already been translated in Hindi with the help of the anusaaraka.)

3. A scholar wants to find out about what an original work or epic actually says, where the original is in a language which he does not know.

Translation is available, but he wants to see for himself as to what the epic says and what the translator has interpreted. He can read the epic directly through the anusaaraka. As the machine does not interpret, and presents an image of the contents, he is able to see the original without the translator's interpretation.

Anusaaraka Home