eCite Digital Repository

GUDM: automatic generation of unified datasets for learning and reasoning in healthcare


Ali, R and Siddiqi, MH and Ahmed, MI and Ali, T and Hussain, S and Huh, EN and Kang, BH and Lee, S, GUDM: automatic generation of unified datasets for learning and reasoning in healthcare, Sensors, 15, (7) pp. 15772-15798. ISSN 1424-8220 (2015) [Refereed Article]


Copyright Statement

Copyright 2015 The Authors. Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0)

DOI: doi:10.3390/s150715772


A wide array of biomedical data are generated and made available to healthcare experts. However, due to the diverse nature of data, it is difficult to predict outcomes from it. It is therefore necessary to combine these diverse data sources into a single unified dataset. This paper proposes a global unified data model (GUDM) to provide a global unified data structure for all data sources and generate a unified dataset by a "data modeler" tool. The proposed tool implements user-centric priority based approach which can easily resolve the problems of unified data modeling and overlapping attributes across multiple datasets. The tool is illustrated using sample diabetes mellitus data. The diverse data sources to generate the unified dataset for diabetes mellitus include clinical trial information, a social media interaction dataset and physical activity data collected using different sensors. To realize the significance of the unified dataset, we adopted a well-known rough set theory based rules creation process to create rules from the unified dataset. The evaluation of the tool on six different sets of locally created diverse datasets shows that the tool, on average, reduces 94.1% time efforts of the experts and knowledge engineer while creating unified datasets.

Item Details

Item Type:Refereed Article
Keywords:unified dataset, data fusion, data model, rough set theory, knowledge acquisition, reasoning, clinical trials, social media, sensors
Research Division:Information and Computing Sciences
Research Group:Data management and data science
Research Field:Data engineering and data science
Objective Division:Information and Communication Services
Objective Group:Information services
Objective Field:Information services not elsewhere classified
UTAS Author:Kang, BH (Professor Byeong Kang)
ID Code:107236
Year Published:2015
Web of Science® Times Cited:12
Deposited By:Information and Communication Technology
Deposited On:2016-03-08
Last Modified:2017-11-13
Downloads:179 View Download Statistics

Repository Staff Only: item control page