P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge

Partha Pratim Ray; Mohan Pratap Pradhan

airo 25(1):

Research Article

P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge

Download14 downloads

Cite: BibTeX Plain Text

@ARTICLE{10.4108/airo.9292,
    author={Partha Pratim Ray and Mohan Pratap Pradhan},
    title={P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge},
    journal={EAI Endorsed Transactions on AI and Robotics},
    volume={4},
    number={1},
    publisher={EAI},
    journal_a={AIRO},
    year={2025},
    month={7},
    keywords={Peer-to-peer, Edge computing, Quantized LLMs, Resource-constrained edge, Decentralized generative AI, Web frameworks},
    doi={10.4108/airo.9292}
}

Partha Pratim Ray
Mohan Pratap Pradhan
Year: 2025
P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge
AIRO
EAI
DOI: 10.4108/airo.9292

Partha Pratim Ray¹^,*, Mohan Pratap Pradhan¹

1: Sikkim University

*Contact email: parthapratimray1986@gmail.com

Abstract

In this research, we present \textit{P2PLLMEdge}, a pioneering peer-to-peer framework designed to enable localized Large Language Models (LLMs) to operate efficiently in resource-constrained edge environments, exemplified by devices such as the Raspberry Pi 4B and CPU-only laptops. The framework addresses critical challenges, including limited computational capacity, network overhead, and scalability, by leveraging lightweight RESTful communication protocols, model-specific quantization, and decentralized task distribution. Key results demonstrate that \textit{P2PLLMEdge} achieves substantial performance improvements. On average, Peer 2 (CPU-only laptop) achieves a 44.7\% reduction in total duration ($t{\text{peer2, total}} = 15.87 \times 10^9 \ \mathrm{ns}$) compared to Peer 1 (Raspberry Pi 4B, $t{\text{peer1, total}} = 28.18 \times 10^9 \ \mathrm{ns}$). The framework processes tokens at a rate of 21.77 tokens/second on advanced LLMs like \texttt{Granite3.1-moe:1b}, significantly outperforming the baseline. Peer 1, employing quantized LLMs such as \texttt{smolm2:360m-instruct-q8_0}, reduces prompt evaluation duration by 23.2\% ($t{\text{peer1, prompt_eval}} = 0.76 \times 10^9 \ \mathrm{ns}$) compared to larger models like \texttt{qwen2.5:0.5b-instruct} ($t{\text{peer1, prompt_eval}} = 0.99 \times 10^9 \ \mathrm{ns}$). Peer 2 demonstrates superior summarization capabilities, with evaluation durations ($t{\text{peer2, eval}}$) reduced by 72.8\% ($t{\text{peer2, eval}} = 5.15 \times 10^9 \ \mathrm{ns}$) for explanation-type prompts relative to Peer 1 ($t{\text{peer1, eval}} = 18.93 \times 10^9 \ \mathrm{ns}$). The framework also achieves significant network efficiency, reducing inter-peer communication durations by up to 44.9\% ($t{\text{peer2, network}} = 25.83 \times 10^9 \ \mathrm{ns}$ vs. $t_{\text{peer1, network}} = 46.92 \times 10^9 \ \mathrm{ns}$). Peer-to-peer synergy ensures seamless task execution, where Peer 1 generates text and offloads computationally intensive summarization tasks to Peer 2, achieving a balance between performance and resource utilization. The novelty of \textit{P2PLLMEdge} lies in its ability to seamlessly integrate lightweight LLMs with decentralized edge devices, achieving advanced natural language processing functionalities entirely on edge devices traditionally deemed unsuitable for such tasks. This framework provides an adaptable, and cost-effective approach for deploying quantized LLM-driven applications. Future directions include scaling the framework to multi-peer environments, optimizing task scheduling algorithms, and exploring integration with heterogeneous LLM-enabled systems. The codes are available on https://github.com/ParthaPRay/peer_to_peer_local_llm_interaction.

Keywords: Peer-to-peer, Edge computing, Quantized LLMs, Resource-constrained edge, Decentralized generative AI, Web frameworks

Received: 2025-05-11
Accepted: 2025-06-25
Published: 2025-07-08
Publisher: EAI

: http://dx.doi.org/10.4108/airo.9292

Copyright © 2025 Partha Pratim Ray et al., licensed to EAI. This is an open access article distributed under the terms of the (https://creativecommons.org/licenses/by-nc-sa/4.0/), which permits copying, redistributing, remixing, transformation, and bilding upon the material in any medium so long as the original work is properly cited.

P2PLLMEdge: Peer-to-Peer Framework for Localized Large Language Models using CPU only Resource-Constrained Edge

Abstract

About EAI

Community

Publish with EAI