eCite Digital Repository

Detecting potential labeling errors for bioinformatics by multiple voting


Guan, D and Yuan, W and Ma, T and Lee, S, Detecting potential labeling errors for bioinformatics by multiple voting, Knowledge-Based Systems, 66 pp. 28-35. ISSN 0950-7051 (2014) [Refereed Article]

Copyright Statement

Copyright 2014 Elsevier B.V.

DOI: doi:10.1016/j.knosys.2014.04.013


Classification techniques are important in bioinformatics analysis as they can separate various bioinformatical data into distinct groups. To obtain good classifiers, accurate labeling of the training data is required. However labeling in practical bioinformatics applications might be erroneous due to various reasons. To identify those mislabeled data, an ensemble learning based scheme, single-voting has been widely used. It generates multiple classifiers and makes use of their voting to detect mislabeled data. Single-voting scheme mainly consists of two components: data partitioning component to generate multiple classifiers, and mislabeled detection component to identify mislabeled data. Existing works in this field mainly focus on mislabeled detection part and neglect data partitioning. However, our analysis shows that data partitioning plays an important role in single-voting scheme. This analysis helps us proposing a novel multiple-voting scheme. It is superior to traditional single-voting by reducing the unreliable influence from data partitioning. Empirical and theoretical evaluations on a set of bioinformatics datasets illustrate the utility of our proposed scheme.

Item Details

Item Type:Refereed Article
Keywords:bioinformatics analysis, mislabeled data detection, single-voting, multiple-voting, classification
Research Division:Information and Computing Sciences
Research Group:Library and information studies
Research Field:Human information interaction and retrieval
Objective Division:Information and Communication Services
Objective Group:Information systems, technologies and services
Objective Field:Information systems, technologies and services not elsewhere classified
UTAS Author:Lee, S (Professor Sungyoung Lee)
ID Code:122917
Year Published:2014
Web of Science® Times Cited:11
Deposited By:Information and Communication Technology
Deposited On:2017-12-06
Last Modified:2018-05-04

Repository Staff Only: item control page