eCite Digital Repository

Machine learning for mining imbalanced data


Arafat, MY and Hoque, S and Xu, S and Farid, DM, Machine learning for mining imbalanced data, IAENG International Journal of Computer Science, 46, (2) pp. 332-348. ISSN 1819-656X (2019) [Refereed Article]


Copyright Statement

Copyright 2019 The Author(s) Licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)


© International Association of Engineers. Mining imbalanced data, which is also known as a class imbalanced problem is one of the most enormously challenging tasks in machine learning for data mining applications. To achieve overall accurate performance in imbalanced classification employing machine learning techniques is difficult as the majority class instances always overpower the minority class instances by a huge difference. An unequal distribution is very common in real-world high dimensional datasets, where binary classification is more frequent than multi-class classification task. Most existing machine learning algorithms are more focused on classifying majority class instances while ignoring or misclassifying minority class instances. Several techniques have been introduced in the last decades for imbalanced data classification, where each of this techniques has their own advantages and disadvantages. In this paper, we have studied and compared 12 extensively imbalanced data classification methods: SMOTE, AdaBoost, RUSBoost, EUSBoost, SMOTEBoost, MSMOTEBoost, DataBoost, Easy Ensemble, BalanceCascade, OverBagging, UnderBagging, SMOTEBagging to extract their characteristics and performance on 27 imbalanced datasets. In general, the combination of both over-sampling and undersampling techniques with ensemble classifiers such as bagging and boosting achieve the highest accuracy for classifying both majority and minority class instances. Additionally, an extensive and critical review of the existing algorithms of imbalanced learning is provided with detailed discussion. According to our findings, we advise some practical suggestions based on the reviewed papers to offer further research directions for imbalanced learning.

Item Details

Item Type:Refereed Article
Keywords:imbalanced data, machine learning, data sampling, ensemble learning, random sampling
Research Division:Information and Computing Sciences
Research Group:Artificial intelligence
Research Field:Knowledge representation and reasoning
Objective Division:Information and Communication Services
Objective Group:Information systems, technologies and services
Objective Field:Information systems, technologies and services not elsewhere classified
UTAS Author:Xu, S (Dr Shuxiang Xu)
ID Code:136017
Year Published:2019
Deposited By:Information and Communication Technology
Deposited On:2019-11-26
Last Modified:2020-05-20
Downloads:4 View Download Statistics

Repository Staff Only: item control page