A Customizable, Self-Learnable Parameterized Machine Translation System Achieved via Two-Way Training
Dr. Keh-Yih Su
Behavior Design Corporation
2F, No.5, Industry E. Rd. IV,
Science-Based Industrial Park,
Hsinchu, Taiwan 30077, R.O.C.
Traditionally, Machine Translation Systems adopt rule-based approaches and are designed to have a general-purpose kernel which only changes dictionaries when the domain is switched; and it is hoped that wide coverage and high quality could be obtained at the same time. Such approaches, however, suffer from the problems of dealing with non-deterministic knowledge, and have great difficulty in acquiring the huge fine-grained knowledge required. A Parameterized MT architecture, which allows self-learning and customization in a specific domain with high translation quality is thus greatly desired. About fifteen years ago, IBM proposed a purely statistical approach to handle the problems above mentioned. However, without adopting any linguistic or AI models, this approach fails to handle long distance dependency within the context, and has a very huge parameter space.
In this talk, the major problems of current machine translation systems are first outlined. The characteristics of NLP is then given. Afterwards, a new direction, highlighting the system capability to be self-learnable and customizable, is proposed for attacking those previously described problems, which are mainly resulted from the intrinsic complexity of natural languages. The proposed solution first builds a stochastic language model on top of linguistic models, and then adopts an unsupervised two-way training mechanism and a parameterized architecture to automatically acquire the non-deterministic knowledge required, such that the system can be easily adapted to different domains and various preferences of individual users.