Ranjan, R and Garg, S and Khoskbar, AR and Solaiman, E and James, P and Georgakopoulos, D, Orchestrating BigData analysis workflows, IEEE Cloud Computing, 4, (3) pp. 20-28. ISSN 2325-6095 (2017) [Refereed Article]
Data analytics has become not only an essential part of day-to-day decision making, but also reinforces long-term strategic decisions. Whether it is real-time fraud detection, resource management, tracking and prevention of disease outbreak, natural disaster management or intelligent traffic management, the extraction and exploitation of insightful information from unparalleled quantities of data (BigData) is now a fundamental part of all decision making processes. Success in making smart decisions by analyzing BigData is possible due to the availability of improved analytical capabilities, increased access to different data sources, and cheaper and improved computing power in the form of cloud computing. However, BigData analysis is far more complicated than the perception created by the recent publicity. For example, one of the myths is that BigData analysis is driven purely by the innovation of new data mining and machine learning algorithms. While innovation of new data mining and machine learning algorithms is critical, this is only one aspect of producing BigData analysis solutions. Just like many other software solutions, BigData analysis solutions are not monolithic pieces of software that are developed specifically for every application. Instead, they often combine and reuse existing trusted software components that perform necessary data analysis steps. Furthermore, in order to deal with the large variety, volume and velocity of BigData, they need to take advantage of the elasticity of cloud and edge datacenter computation and storage resources as needed to meet the requirements of their owners.