1. How to install Telugu-Hindi anusAraka? a) Give the command auto_install.pl at prompt. (Make sure that auto_install.pl has +x permission.) It will take time around 15 min. to install. b) After the installation is over, you will see a message *** Add the following line to the PATH variable in .bash_profile *** $INSTALL_DIR/amba/bin where $INSTALL_DIR is the directory where the Telugu-Hindi anusAraka is installed. c) To make the path effective, either logout and login again, or type source .bash_profile at the prompt. Now you are ready to use Telugu-Hindi anusAraka. /** CAUTION : IF YOU WANT TO TEST ANY OF THE FOLLOWING COMMANDS, CREATE A NEW DIRECTORY AND WORK WITHIN IT **/ 2. How to run anusAraka? Let 'f1' be a file which contains telugu text in ISCII(Indian Standard Code for Information Interchange) on which you want to run anusAraka. The command is anu_tlg_gur < f1 > f1.out f1.out will contain the anusAraka output. The output will be in ISCII. 3. How to run telugu morph? Telugu morph has two layers. The first is the core telugu morph which deals with inflectional as well as the derivational morph. The second layer handles the spelling variations and sandhi to some extent. a) The first layer can be run either interactively or in a batch mode. If you want to run it in batch mode, the input file should contain one word per line. The command for batch mode is o moh_gur.shwrds_file.mo or o mohu.sh wrds_file.mo (for user friendly outout) where wrds_file contains single telugu word per line. The words should be in w-x notation. The output will be stored in wrds_file.mo (will be in w-x notation) Another command for batch mode is o moh_gur_batch.sh wrds_file where wrds_file contains single telugu word per line. The words should be in iscii. The output will be stored in wrds_file.mo (will be in w-x notation) To run the morph in interactive mode, just enter moh_gur.sh at the command level, and then enter the telugu words in w-x notationi one after the another. To end, type 'NW' (mnemonic for NULL WORD). The words not recognised by the core morphological analyser are stored in the file 'uword' in the current working directory. b) You can't run second layer in isolation. Following is the command to run the core morph followed by the sandhi split and spelling variation module. tlg_mo_total filename The input should be in iscii The output will be stored in filename_mo.out (will be in w-x notation) The words that are unrecognised at the end will be stored in the file 'rem1'. 4. How to convert an iscii file into w-x notation and vice-versa? The programs 'ir' and 'ri' do the required conversions. (These programs are available in PATH_TO_ANU_INSTALL_DIR/amba/bin) ir : converts from iscii to w-x roman notation ri : converts from w-x roman notation to iscii. Usage : ir < input_file_name > output_file_name ri < input_file_name > output_file_name These can also be used in interactive mode. 5. What are the steps involved in pre-editing the text?(or) 6. How to use the Spell Checker for Telugu? The input file should be in ISCII. Run the following commands in the specified order. o mark_ajFAwa filename ( This runs the telugu morphological analyser on the whole text and marks the words that are not recognised by a tag <¤ºè¼ÚÂ>.) o suggest_gur.sh filename.uw (This refers the existing database and suggests variant spellings or possible splittings for the ¤ºè¼Ú words.) o pre_edit.sh filename.uw.mrk (This command enables one to edit the filename.uw.mrk By correcting the words with <¤ºè¼ÚÂ> tag, press 'g' to check whether it is accepted or not. Alternately some words are provided with alternate spelling in angular brackets. Here user can select the alternate spelling by pressing '.') 7. What are the steps involved in the anusAraka? o Takes text from the user (source text) in iscii. o Runs core morphological analyser. o The words not recognised by the morphological analyser are passed on to the sandhi splitter and the spelling variation modules. o Next step groups the words that form a meaningful unit. o maps the source language text to target language text by using bilingual dictionaries. o Generates the text in target language. o Aligns the source and target language texts to produce the anusAraka output The anusAraka output consists of pairs of line seperated by a dotted lines. The first line contains the source language text and the second line contains target language text. For more guidelines on reading anusAraka output, refer anu_pATa 8. Where can I find the intermediate output of the anusAraka? All intermediate files will be stored in the subdirecory : tmp_anu_dir This directory will be created in the directory from where you run the anusAraka. NOTE : you can delete any of the files in the directory tmp_anu_dir if neccesary, without any harm. 9. What are the pre-requisits for installation of anusAraka? o Linux Operating System o Perl o Perl enabled vim o GDBM o Flex