eCite Digital Repository

ETNLP: A visual-aided systematic approach to select pre-trained embeddings for a downstream task

Citation

Vu, X-S and Vu, T and Tran, SN and Jiang, L, ETNLP: A visual-aided systematic approach to select pre-trained embeddings for a downstream task, Proceedings of the 2019 Recent Advances in Natural Language Processing International Conference, 2-4 September 2019, Varna, Bulgaria, pp. 1285-1294. ISSN 1313-8502 (2019) [Refereed Conference Paper]


Preview
PDF
Restricted - Request a copy
740Kb
  

Copyright Statement

Copyright unknown

DOI: doi:10.26615/978-954-452-056-4_147

Abstract

© 2019 Association for Computational Linguistics (ACL). All rights reserved. Given many recent advanced embedding models, selecting pre-trained word embedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pre-trained word embeddings to determine which embeddings should be used in a downstream task. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https://github.com/vietnlp/etnlp.

Item Details

Item Type:Refereed Conference Paper
Keywords:word embedding
Research Division:Information and Computing Sciences
Research Group:Artificial intelligence
Research Field:Natural language processing
Objective Division:Information and Communication Services
Objective Group:Information systems, technologies and services
Objective Field:Information systems, technologies and services not elsewhere classified
UTAS Author:Tran, SN (Dr Son Tran)
ID Code:139114
Year Published:2019
Deposited By:Information and Communication Technology
Deposited On:2020-05-27
Last Modified:2020-06-16
Downloads:0

Repository Staff Only: item control page