1. How to install Telugu-Hindi anusAraka?

a) Give the command auto_install.pl at prompt.
(Make sure that auto_install.pl has +x permission.)

It will take time around 15 min. to install.

b) After the installation is over, you will see a message
*** Add the following line to the PATH variable in .bash_profile ***
$INSTALL_DIR/amba/bin
where $INSTALL_DIR is the directory where the Telugu-Hindi anusAraka
is installed.

c) To make the path effective,
either logout and login again, or
type
source .bash_profile
at the prompt.

Now you are ready to use Telugu-Hindi anusAraka.

/** CAUTION : IF YOU WANT TO TEST ANY OF THE FOLLOWING COMMANDS, CREATE
A NEW DIRECTORY AND WORK WITHIN IT **/

2. How to run anusAraka?
Let 'f1' be a file which contains telugu text in ISCII(Indian Standard
Code for Information Interchange) on which
you want to run anusAraka. The command is

anu_tlg_gur < f1 > f1.out

f1.out will contain the anusAraka output. The output will be in ISCII.

3. How to run telugu morph?

Telugu morph has two layers.
The first is the core telugu morph which deals with inflectional
as well as the derivational morph. The second layer handles the
spelling variations and sandhi to some extent.

a) The first layer can be run either interactively or in a batch mode.
If you want to run it in batch mode, the input file should contain one
word per line.

The command for batch mode is

o moh_gur.sh wrds_file.mo
or
o mohu.sh wrds_file.mo (for user friendly outout)

where wrds_file contains single telugu word per line. The words should
be in w-x notation.
The output will be stored in wrds_file.mo (will be in w-x notation)

Another command for batch mode is

o moh_gur_batch.sh wrds_file

where wrds_file contains single telugu word per line. The words should
be in iscii.
The output will be stored in wrds_file.mo (will be in w-x notation)

To run the morph in interactive mode,
just enter moh_gur.sh at the command level, and then enter the telugu
words in w-x notationi one after the another. To end, type 'NW' (mnemonic for NULL WORD).

The words not recognised by the core morphological analyser are stored
in the file 'uword' in the current working directory.

b) You can't run second layer in isolation. Following is the command to
run the core morph followed by the sandhi split and spelling variation module.

tlg_mo_total filename

The input should be in iscii
The output will be stored in filename_mo.out (will be in w-x notation)
The words that are unrecognised at the end will be stored in the file
'rem1'.

4. How to convert an iscii file into w-x notation and vice-versa?
The programs 'ir' and 'ri' do the required conversions.
(These programs are available in PATH_TO_ANU_INSTALL_DIR/amba/bin)
ir : converts from iscii to w-x roman notation
ri : converts from w-x roman notation to iscii.

Usage :
ir < input_file_name > output_file_name
ri < input_file_name > output_file_name

These can also be used in interactive mode.

5. What are the steps involved in pre-editing the text?(or)
6. How to use the Spell Checker for Telugu?

The input file should be in ISCII.
Run the following commands in the specified order.

o mark_ajFAwa filename

( This runs the telugu morphological analyser on the whole text and
marks the words that are not recognised by a tag <ず莠畋>.)

o suggest_gur.sh filename.uw

(This refers the existing database and suggests variant spellings or
possible splittings for the ず莠畋 words.)

o pre_edit.sh filename.uw.mrk

(This command enables one to edit the filename.uw.mrk

By correcting the words with <ず莠畋> tag, press 'g' to check whether
it is accepted or not.

Alternately some words are provided with alternate spelling in angular
brackets. Here user can select the alternate spelling by pressing '.')

7. What are the steps involved in the anusAraka?

o Takes text from the user (source text) in iscii.
o Runs core morphological analyser.
o The words not recognised by the morphological analyser are passed on to
the sandhi splitter and the spelling variation modules.
o Next step groups the words that form a meaningful unit.
o maps the source language text to target language text by using
bilingual dictionaries.
o Generates the text in target language.
o Aligns the source and target language texts to produce the anusAraka
output

The anusAraka output consists of pairs of line seperated by a dotted lines.
The first line contains the source language text and the second line
contains target language text.
For more guidelines on reading anusAraka output, refer anu_pATa

8. Where can I find the intermediate output of the anusAraka?

All intermediate files will be stored in the subdirecory : tmp_anu_dir
This directory will be created in the directory from where you run
the anusAraka.
NOTE : you can delete any of the files in the directory tmp_anu_dir
if neccesary, without any harm.

9. What are the pre-requisits for installation of anusAraka?

o Linux Operating System
o Perl
o Perl enabled vim
o GDBM
o Flex