About | Contact Us | Register | Login
ProceedingsSeriesJournalsSearchEAI
airo 25(1):

Research Article

P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge

Download14 downloads
Cite
BibTeX Plain Text
  • @ARTICLE{10.4108/airo.9292,
        author={Partha Pratim Ray and Mohan Pratap Pradhan},
        title={P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge},
        journal={EAI Endorsed Transactions on AI and Robotics},
        volume={4},
        number={1},
        publisher={EAI},
        journal_a={AIRO},
        year={2025},
        month={7},
        keywords={Peer-to-peer, Edge computing, Quantized LLMs, Resource-constrained edge, Decentralized generative AI, Web frameworks},
        doi={10.4108/airo.9292}
    }
    
  • Partha Pratim Ray
    Mohan Pratap Pradhan
    Year: 2025
    P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge
    AIRO
    EAI
    DOI: 10.4108/airo.9292
Partha Pratim Ray1,*, Mohan Pratap Pradhan1
  • 1: Sikkim University
*Contact email: parthapratimray1986@gmail.com

Abstract

In this research, we present \textit{P2PLLMEdge}, a pioneering peer-to-peer framework designed to enable localized Large Language Models (LLMs) to operate efficiently in resource-constrained edge environments, exemplified by devices such as the Raspberry Pi 4B and CPU-only laptops. The framework addresses critical challenges, including limited computational capacity, network overhead, and scalability, by leveraging lightweight RESTful communication protocols, model-specific quantization, and decentralized task distribution. Key results demonstrate that \textit{P2PLLMEdge} achieves substantial performance improvements. On average, Peer 2 (CPU-only laptop) achieves a 44.7\% reduction in total duration ($t{\text{peer2, total}} = 15.87 \times 10^9 \ \mathrm{ns}$) compared to Peer 1 (Raspberry Pi 4B, $t{\text{peer1, total}} = 28.18 \times 10^9 \ \mathrm{ns}$). The framework processes tokens at a rate of 21.77 tokens/second on advanced LLMs like \texttt{Granite3.1-moe:1b}, significantly outperforming the baseline. Peer 1, employing quantized LLMs such as \texttt{smolm2:360m-instruct-q8_0}, reduces prompt evaluation duration by 23.2\% ($t{\text{peer1, prompt_eval}} = 0.76 \times 10^9 \ \mathrm{ns}$) compared to larger models like \texttt{qwen2.5:0.5b-instruct} ($t{\text{peer1, prompt_eval}} = 0.99 \times 10^9 \ \mathrm{ns}$). Peer 2 demonstrates superior summarization capabilities, with evaluation durations ($t{\text{peer2, eval}}$) reduced by 72.8\% ($t{\text{peer2, eval}} = 5.15 \times 10^9 \ \mathrm{ns}$) for explanation-type prompts relative to Peer 1 ($t{\text{peer1, eval}} = 18.93 \times 10^9 \ \mathrm{ns}$). The framework also achieves significant network efficiency, reducing inter-peer communication durations by up to 44.9\% ($t{\text{peer2, network}} = 25.83 \times 10^9 \ \mathrm{ns}$ vs. $t_{\text{peer1, network}} = 46.92 \times 10^9 \ \mathrm{ns}$). Peer-to-peer synergy ensures seamless task execution, where Peer 1 generates text and offloads computationally intensive summarization tasks to Peer 2, achieving a balance between performance and resource utilization. The novelty of \textit{P2PLLMEdge} lies in its ability to seamlessly integrate lightweight LLMs with decentralized edge devices, achieving advanced natural language processing functionalities entirely on edge devices traditionally deemed unsuitable for such tasks. This framework provides an adaptable, and cost-effective approach for deploying quantized LLM-driven applications. Future directions include scaling the framework to multi-peer environments, optimizing task scheduling algorithms, and exploring integration with heterogeneous LLM-enabled systems. The codes are available on https://github.com/ParthaPRay/peer_to_peer_local_llm_interaction.

Keywords
Peer-to-peer, Edge computing, Quantized LLMs, Resource-constrained edge, Decentralized generative AI, Web frameworks
Received
2025-05-11
Accepted
2025-06-25
Published
2025-07-08
Publisher
EAI
http://dx.doi.org/10.4108/airo.9292

Copyright © 2025 Partha Pratim Ray et al., licensed to EAI. This is an open access article distributed under the terms of the (https://creativecommons.org/licenses/by-nc-sa/4.0/), which permits copying, redistributing, remixing, transformation, and bilding upon the material in any medium so long as the original work is properly cited.

EBSCOProQuestDBLPDOAJPortico
EAI Logo

About EAI

  • Who We Are
  • Leadership
  • Research Areas
  • Partners
  • Media Center

Community

  • Membership
  • Conference
  • Recognition
  • Sponsor Us

Publish with EAI

  • Publishing
  • Journals
  • Proceedings
  • Books
  • EUDL