Technical Programme - ICON 2003 - International Conference On Natutal Language Processing


	ICON-2003 Home \| Important Dates \| Contact Us

QUICK MENU

Call for Papers

Full Information

About ICON-2003

Committee

Topics

Submission Deadlines

Format of Submission

Call for Tutorials

Submit your Paper

Technical Programme

Full Info

Papers Only

Tutorials Only

Keynote address

Invited Talks

Registration

Important Dates

Travel & Stay Guide

Keynote Address Abstract

Topic
A Customizable, Self-Learnable Parameterized Machine Translation System Achieved via Two-Way Training

Speaker:
Dr. Keh-Yih Su Behavior Design Corporation
2F, No.5, Industry E. Rd. IV,
Science-Based Industrial Park,
Hsinchu, Taiwan 30077, R.O.C.
Profile

Abstract:

Traditionally, Machine Translation Systems adopt rule-based approaches and are designed to have a general-purpose kernel which only changes dictionaries when the domain is switched; and it is hoped that wide coverage and high quality could be obtained at the same time. Such approaches, however, suffer from the problems of dealing with non-deterministic knowledge, and have great difficulty in acquiring the huge fine-grained knowledge required. A Parameterized MT architecture, which allows self-learning and customization in a specific domain with high translation quality is thus greatly desired. About fifteen years ago, IBM proposed a purely statistical approach to handle the problems above mentioned. However, without adopting any linguistic or AI models, this approach fails to handle long distance dependency within the context, and has a very huge parameter space.

In this talk, the major problems of current machine translation systems are first outlined. The characteristics of NLP is then given. Afterwards, a new direction, highlighting the system capability to be self-learnable and customizable, is proposed for attacking those previously described problems, which are mainly resulted from the intrinsic complexity of natural languages. The proposed solution first builds a stochastic language model on top of linguistic models, and then adopts an unsupervised two-way training mechanism and a parameterized architecture to automatically acquire the non-deterministic knowledge required, such that the system can be easily adapted to different domains and various preferences of individual users.