Circumventing Stragglers and Staleness in Distributed CNN using LSTM

Aswathy Ravikumar; Harini Sriraman; Saddikuti Lokesh; Jitendra Sai

IoT 23(1):

Research Article

Circumventing Stragglers and Staleness in Distributed CNN using LSTM

Download104 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/eetiot.5119,
    author={Aswathy Ravikumar and Harini Sriraman and Saddikuti Lokesh and Jitendra Sai},
    title={Circumventing Stragglers and Staleness in Distributed CNN using LSTM},
    journal={EAI Endorsed Transactions on Internet of Things},
    volume={10},
    number={1},
    publisher={EAI},
    journal_a={IOT},
    year={2024},
    month={2},
    keywords={Convolutional Neural Network, AWS Sage Maker, Distributed Framework, Parameter Server, Exa-Scale Computing, Distributed Autotuning},
    doi={10.4108/eetiot.5119}
}

Aswathy Ravikumar
Harini Sriraman
Saddikuti Lokesh
Jitendra Sai
Year: 2024
Circumventing Stragglers and Staleness in Distributed CNN using LSTM
IOT
EAI
DOI: 10.4108/eetiot.5119

Aswathy Ravikumar¹^,*, Harini Sriraman¹, Saddikuti Lokesh¹, Jitendra Sai¹

1: Vellore Institute of Technology University

*Contact email: aswathyravi2290@gmail.com

Abstract

INTRODUCTION: Using neural networks for these inherently distributed applications is challenging and time-consuming. There is a crucial need for a framework that supports a distributed deep neural network to yield accurate results at an accelerated time. METHODS: In the proposed framework, any experienced novice user can utilize and execute the neural network models in a distributed manner with the automated hyperparameter tuning feature. In addition, the proposed framework is provided in AWS Sage maker for scaling the distribution and achieving exascale FLOPS. We benchmarked the framework performance by applying it to a medical dataset. RESULTS: The maximum performance is achieved with a speedup of 6.59 in 5 nodes. The model encourages expert/ novice neural network users to apply neural network models in the distributed platform and get enhanced results with accelerated training time. There has been a lot of research on how to improve the training time of Convolutional Neural Networks (CNNs) using distributed models, with a particular emphasis on automating the hyperparameter tweaking process. The study shows that training times may be decreased across the board by not just manually tweaking hyperparameters, but also by using L2 regularization, a dropout layer, and ConvLSTM for automatic hyperparameter modification. CONCLUSION: The proposed method improved the training speed for model-parallel setups by 1.4% and increased the speed for parallel data by 2.206%. Data-parallel execution achieved a high accuracy of 93.3825%, whereas model-parallel execution achieved a top accuracy of 89.59%.

Keywords: Convolutional Neural Network, AWS Sage Maker, Distributed Framework, Parameter Server, Exa-Scale Computing, Distributed Autotuning

Received: 2023-11-24
Accepted: 2024-02-03
Published: 2024-02-14
Publisher: EAI

: http://dx.doi.org/10.4108/eetiot.5119

Copyright © 2024 A. Ravikumar et al., licensed to EAI. This is an open access article distributed under the terms of the CC BY-NCSA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.

Circumventing Stragglers and Staleness in Distributed CNN using LSTM

Abstract

About EAI

Community

Publish with EAI