eCite Digital Repository

MRPack: multi-algorithm execution using compute-intensive approach in MapReduce

Citation

Idris, M and Hussain, S and Siddiqi, MH and Hassan, W and Bilal, HSM and Lee, S, MRPack: multi-algorithm execution using compute-intensive approach in MapReduce, PLoS One, 10, (8) Article e0136259. ISSN 1932-6203 (2015) [Refereed Article]


Preview
PDF
3Mb
  

Copyright Statement

Copyright 2015 Idris et al. Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/

DOI: doi:10.1371/journal.pone.0136259

Abstract

Large quantities of data have been generated from multiple sources at exponential rates in the last few years. These data are generated at high velocity as real time and streaming data in variety of formats. These characteristics give rise to challenges in its modeling, computation, and processing. Hadoop MapReduce (MR) is a well known data-intensive distributed processing framework using the distributed file system (DFS) for Big Data. Current implementations of MR only support execution of a single algorithm in the entire Hadoop cluster. In this paper, we propose MapReducePack (MRPack), a variation of MR that supports execution of a set of related algorithms in a single MR job. We exploit the computational capability of a cluster by increasing the compute-intensiveness of MapReduce while maintaining its data-intensive approach. It uses the available computing resources by dynamically managing the task assignment and intermediate data. Intermediate data from multiple algorithms are managed using multi-key and skew mitigation strategies. The performance study of the proposed system shows that it is time, I/O, and memory efficient compared to the default MapReduce. The proposed approach reduces the execution time by 200% with an approximate 50% decrease in I/O cost. Complexity and qualitative results analysis shows significant performance improvement.

Item Details

Item Type:Refereed Article
Keywords:multi-algorithm, MapReduce
Research Division:Information and Computing Sciences
Research Group:Distributed Computing
Research Field:Distributed and Grid Systems
Objective Division:Information and Communication Services
Objective Group:Communication Networks and Services
Objective Field:Mobile Data Networks and Services
Author:Lee, S (Professor Sungyoung Lee)
ID Code:122920
Year Published:2015
Web of Science® Times Cited:3
Deposited By:Information and Communication Technology
Deposited On:2017-12-06
Last Modified:2018-02-08
Downloads:19 View Download Statistics

Repository Staff Only: item control page