Thivaharan, S and Bharath Kumaar, K S and Sudharsan, S (2022) Machine Comprehension System in Tamil and English based on BERT. In: 2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India.
Full text not available from this repository.Abstract
Question Answering is a part of information retrieval. It is a task of machine-reading comprehension and presents a medium to assess the ability of machines to understand human language. A question-and-answer system is preferable to search engines that return a collection of documents. Any question answering system's high level design consists of three primary components: Question analysis, retrieval of context, answer is being extracted. With an availability of training resources and Transformer-based models trained on huge English corpora, the precision and accuracy of English Question Answering systems has increased dramatically over the years. Such big datasets, however, are not available for low-resource languages like Tamil. For low-resource languages, multilingual BERT (mBERT) models are utilized. The translations from the same language family is used to supplement the available data and fine-tune the mBERT-based model. Cross-lingual learning uses zero, and few-shot techniques applied to transfer the knowledge of a QA model trained on many source examples to a given target language with fewer training data. Pre-training and fine-tuning of the model improves performance, according to model tests
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Uncontrolled Keywords: | BERT; Language model; Low resource languages; LSTM; Pytorch; Question Answering; Question answering systems; Reading comprehension; SQuAD; Transformer |
Subjects: | C Computer Science and Engineering > Artificial Intelligence |
Divisions: | Computer Science and Engineering |
Depositing User: | Users 5 not found. |
Date Deposited: | 27 Jun 2024 08:22 |
Last Modified: | 27 Jun 2024 08:23 |
URI: | https://ir.psgitech.ac.in/id/eprint/643 |