Cloud Computing, Smart Grid and Innovative Frontiers in Telecommunications. 9th EAI International Conference, CloudComp 2019, and 4th EAI International Conference, SmartGIFT 2019, Beijing, China, December 4-5, 2019, and December 21-22, 2019

Research Article

Near-Data Prediction Based Speculative Optimization in a Distribution Environment

Download
124 downloads
  • @INPROCEEDINGS{10.1007/978-3-030-48513-9_9,
        author={Mingxu Sun and Xueyan Wu and Dandan Jin and Xiaolong Xu and Qi Liu and Xiaodong Liu},
        title={Near-Data Prediction Based Speculative Optimization in a Distribution Environment},
        proceedings={Cloud Computing, Smart Grid and Innovative Frontiers in Telecommunications. 9th EAI International Conference, CloudComp 2019, and 4th EAI International Conference, SmartGIFT 2019, Beijing, China, December 4-5, 2019, and December 21-22, 2019},
        proceedings_a={CLOUDCOMP},
        year={2020},
        month={6},
        keywords={Distributed systems Hadoop Speculative execution Locally weighted regression Near data prediction},
        doi={10.1007/978-3-030-48513-9_9}
    }
    
  • Mingxu Sun
    Xueyan Wu
    Dandan Jin
    Xiaolong Xu
    Qi Liu
    Xiaodong Liu
    Year: 2020
    Near-Data Prediction Based Speculative Optimization in a Distribution Environment
    CLOUDCOMP
    Springer
    DOI: 10.1007/978-3-030-48513-9_9
Mingxu Sun1, Xueyan Wu2, Dandan Jin2,*, Xiaolong Xu2, Qi Liu3, Xiaodong Liu4
  • 1: University of Jinan
  • 2: Nanjing University of Information Science and Technology
  • 3: Shandong Beiming Medical Technology Co., Ltd.
  • 4: Napier University Edinburgh
*Contact email: 18751971087@163.com

Abstract

Apache Hadoop is an open source software framework that supports data-intensive distributed applications and is distributed under the Apache 2.0 licensing agreement, where consumers will no longer deal with complex configuration of software and hardware but only pay for cloud services on demand. So how to make the performance of the cloud platform become more important in a consumer-centric environment. There exists imbalance between in some distribution of slow tasks, which results in straggling tasks will have a great influence on the Hadoop framework. By monitoring those tasks in real-time progress and copying the potential Stragglers to a different node, the speculative execution (SE) realizes to improve the probability of finishing those backup tasks before the original ones. The Speculative execution (SE) applies this principle and thus proposed a solution to handle the Straggling tasks. At present, the performance of the Hadoop system is unsatisfying because of the erroneous judgement and inappropriate selection for the backup nodes in the current SE policy. This paper proposes an SE optimized strategy which can be used in prediction of near data. In this strategy, the first step is gathering the real-time task execution information and the remaining runtime required for the task is predicted by a local prediction method. Then it chooses a proper backup node according to the near data and actual demand in the second step. On the other side, this model also includes a cost-effective model in order to make the performance of SE to the peak. The results show that using this strategy in Hadoop effectively improves the accuracy of alternative tasks and effects better in heterogeneous Hadoop environments in various situations, which is beneficial to consumers and cloud platform.