eCite Digital Repository
ETNLP: A visual-aided systematic approach to select pre-trained embeddings for a downstream task
Citation
Vu, X-S and Vu, T and Tran, SN and Jiang, L, ETNLP: A visual-aided systematic approach to select pre-trained embeddings for a downstream task, Proceedings of the 2019 Recent Advances in Natural Language Processing International Conference, 2-4 September 2019, Varna, Bulgaria, pp. 1285-1294. ISSN 1313-8502 (2019) [Refereed Conference Paper]
![]() | PDF Restricted - Request a copy 740Kb |
Copyright Statement
Copyright unknown
DOI: doi:10.26615/978-954-452-056-4_147
Abstract
© 2019 Association for Computational Linguistics (ACL). All rights reserved. Given many recent advanced embedding models, selecting pre-trained word embedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pre-trained word embeddings to determine which embeddings should be used in a downstream task. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https://github.com/vietnlp/etnlp.
Item Details
Item Type: | Refereed Conference Paper |
---|---|
Keywords: | word embedding |
Research Division: | Information and Computing Sciences |
Research Group: | Artificial intelligence |
Research Field: | Natural language processing |
Objective Division: | Information and Communication Services |
Objective Group: | Information systems, technologies and services |
Objective Field: | Information systems, technologies and services not elsewhere classified |
UTAS Author: | Tran, SN (Dr Son Tran) |
ID Code: | 139114 |
Year Published: | 2019 |
Deposited By: | Information and Communication Technology |
Deposited On: | 2020-05-27 |
Last Modified: | 2020-06-16 |
Downloads: | 0 |
Repository Staff Only: item control page