Efficient Public Blockchain Client for Lightweight Users

Public blockchains provide a decentralized method for storing transaction data and have many applications in different sectors. In order for users to track transactions, a simple method is to let them keep a local copy of the entire public ledger. Since the size of the ledger keeps growing, this method becomes increasingly less practical, especially for lightweight users such as IoT devices and smartphones. In order to cope with the problem, several solutions have been proposed to reduce the storage burden. However, existing solutions either achieve a limited storage reduction (e.g., simple payment verification), or rely on some strong security assumption (e.g., the use of trusted server). In this paper, we propose a new approach to solving the problem. Specifically, we propose an \underline{e}fficient verification protocol for \underline{p}ublic \underline{b}lock\underline{c}hains, or EPBC for short. EPBC is particularly suitable for lightweight users, who only need to store a small amount of data that is {\it independent of} the size of the blockchain. We analyze EPBC's performance and security, and discuss its integration with existing public ledger systems. Experimental results confirm that EPBC is practical for lightweight users.


I. INTRODUCTION
A public blockchain or ledger consists of a set of blocks that are linked together, where each block contains a set of transactions. A public blockchain is maintained by a group of users, who run a consensus protocol (e.g., proof-of-work with longest-chain) to resolve disagreements regarding the blockchain. In a simple realization of public blockchain, each user keeps a local copy of the entire blockchain, meaning that each user has access to all historic activities and can easily test whether a new transaction is consistent with the existing transactions. This explains why a public ledger does not have to rely on any centralized party. This technique is central to many popular applications, such as Bitcoin [1].
Although keeping a local copy of the blockchain in question simplifies many operations (e.g., transaction searching and balance calculation), this imposes a substantial storage overhead because the blockchain keeps growing. For example, the Bitcoin blockchain includes 472,483 blocks in June 2017, or 120 GB in volume. This overhead may not be a problem for modern servers and PCs, but are prohibitive for lightweight users such as mobile devices and IoT devices. In general, this would hinder the development of applications that aim are meant to be built on top of blockchains (e.g., smart contract system [2]). At the same time, smart phones are the major way to get online in some areas, especially in underdeveloped countries, and there is a big need for mobile and lightweight A preliminary version of this paper was published in SERIAL '17 users to use blockchains [3]. Therefore, it is urgent to reduce the storage overhead, especially for those lightweight users.
Indeed, Nakamoto proposes the simplified payment verification (SPV) protocol in the very first Bitcoin paper [1], which requires a client to store some, instead of all, blocks while being able to check the validity of transactions recorded in the blockchain. This technique is also widely used in many blockchain-based applications, such as smart contract system [2]. The basic idea underlying the SPV protocol is that each user only needs to keep the headers of blocks, rather than the blocks themselves. This means that the local storage overhead still increases linearly with the number of blocks, which grows over time and can quickly become prohibitive for lightweight users. An alternate approach is that a lightweight user chooses to trust some nodes in a blockchain system. However, this practice sacrifice the most appealing feature of the blockchains, namely the absence of any trusted third party. Moreover, this approach can be vulnerable to, for example, Sybil attacks [4].
In this paper, we propose an efficient verification protocol for public blockchain, dubbed EPBC. The core of EPBC is a succinct blockchain verification protocol that "compresses" the whole chain to a constant-size summary, using a cryptography accumulator [5]. A lightweight user only needs to store the most recent summary, which is sufficient for the user to verify the validity of transactions. EPBC can be incorporated into existing blockchains as a middle layer service, or can be seamlessly incorporated into new blockchain systems.
In summary, our contributions in this work include: • We design a novel scheme for lightweight users to use public blockchains using cryptographic accumulator. • We analyze the security and asymptotic performance of the scheme, including its storage cost. • We report a prototype implementation of the core protocol of EPBC and measure its performance. Experimental results show that the scheme is practical for lightweight users. The rest of the paper is organized as follows. In Section II we briefly review the background of public blockchains and the simplified payment verification protocol. In Section III we describe the design of the core component of EPBC, i.e., efficient block verification, and analyze its security. Section IV describes two common operations for blockchain based applications using the core component of EPBC, and we provide the architecture to integrate EPBC with existing blockchain systems in Section V. Experimental results are given in Section VI to demonstrate the practicability of EPBC, and Section VII discusses the related prior work. We conclude the paper in Section VIII.

II. BACKGROUND OF PUBLIC BLOCKCHAIN
A blockchain is a distributed ledger that has been used by Bitcoin and other applications to store their transaction data, where a transaction can be a payment operation, smart contract submission, or smart contract execution result submission. There are different approaches to construct blockchains. In this work, we focus on the class of blockchains that are built on the principle of proof-of-work (PoW) [6]. This class of blockchains have a low throughput and a high latency, but have the desirable properties of fairness and expensiveto-attack. Furthermore, there are many efforts at improving their performance [7], [8] and characterizing their security properties [9].  Fig. 1. In the SVP scheme, a user stores the headers of the blocks, rather than the blocks themselves. A header contains the relevant meta data (e.g., the root of the Merkle tree whose leaves are the transactions contained in a block). This allows a user to verify whether a given block is valid or not.
Since a blockchain is immutable and append-only, its size keeps growing. There are proposals for coping with this issue. A straightforward approach is to trust some user, who can check the validity of transactions on the user's behalf. This approach assumes that the lightweight user always knows who can be trusted. Another approach is to use the SPV protocol mentioned above [1]. In this scheme, as highlighted in Fig. 1, a user only needs to store the block headers, which contain the root of the Merkle tree of the transactions in the corresponding block. When a user needs to verify a transaction, it sends a request to the system asking for the corresponding block, whose validity can be verified by using the root of the Merkle tree.

A. Design Objective and Assumption
The objective of EPBC is to allow lightweight users to participate in applications that use public blockchains. By "lightweight users" we mean the users who use devices that have limited computation/storage capacities, such as IoT devices and smartphones. Specifically, EPBC aims to allow lightweight users to achieve the following: • Efficient storage: A user does not have to store or download the entire blockchain. Instead, a user only needs to consume a storage that is ideally independent of the size of the blockchain.
Listen to the blockchain network to learn the latest summary.
Generate a proof of the block.
Verify the proof and take action.

Lightweight User
Blockchain System Send a request of proof of a block. Fig. 2. Illustration of the blockchain verification protocol. The nodes in the blockchain system with bigger storage capacities can keep a full copy of the blockchain. These nodes will interact with the lightweight users to help the latter to verify the validity of blocks.
• Verifiability of transactions: A user can verify whether a transaction has been accepted by the blockchain or not. Like any public blockchain constructed according to proofof-work, we assume that the majority of the users are honest.
In what follows, we first describe the block verification protocol, which is the core component of EPBC. Then, we describe how to use this protocol to construct the EPBC scheme. The blockchain verification protocol of EPBC consists of the following four algorithms:

B. The Block Verification Protocol
• Setup: This algorithm is executed once by the creator of the blockchain. The algorithm generates the public parameters that are needed by the other algorithms. • Block and summary construction: This algorithm generates blocks and a summary of the current blockchain. Anyone participating in the mining competition to build new blocks is responsible for calculating the summary of the current blockchain. The summary depends on the content of the current blockchain and the public parameters. • Proof generation: This algorithm generates a proof for a given block. The proof may depend on, among other things, the entire blockchain. • Proof verification: Given the summary of a blockchain and a proof for a single block, this algorithm verifies whether the proof is valid or not. With this protocol, a lightweight user keeps the updated summary of the blockchain. When the user wants to verify a specific block, it can ask the parties that are involved in a transaction for a proof for the block, which is generated by running the proof generation algorithm. The user then executes the proof verification algorithm to determine whether to accept the block or not. In what follows we describe the details of these algorithms.
a) Setup.: The creator of the blockchain selects two large prime numbers p, q, and calculates N = pq as in the RSA accumulator system. N is embedded into the first block and disclosed to the public; and then p, q are discarded. The creator also selects a random value g ∈ Z * N . Each block will be labelled with an integer, with the "genesis" block (i.e., the first block on the blockchain) has the label "1". b) Block and summary construction.: Each block contains, in addition to the standard attributes (e.g., transaction information and proof-of-work nonce), a new attribute S, which is the summary of the current blockchain. For the i-th block, which is denoted by blk i , the attribute S i is calculated and stored with blk i as follows: If the current blockchain contains n blocks, S n is the summary of the current blockchain. The block position information i is used in the computation for the purpose of preventing the attacker from manipulating the position of a block. After the newly generated block is broadcast to the blockchain system, the following two algorithms can be executed. c) Proof generation.: To generate a proof that shows block blk i is the i-th block on the blockchain with summary S n , where i ≤ n, the prover calculates p i = (p (1) i , p (2) i ) as follows: Note that the proof is generated by a user who keeps the entire blockchain and therefore can compute p d) Proof verification.: Given a block blk, a claimed proof p = (p (1) , p (2) ), and a blockchain summary S n , a user can verify that block blk i is indeed the i-th block on blockchain with summary S n , where i ≤ n, as follows: If both equations hold, the user accepts that p is a valid proof for blk; otherwise, the verifier rejects the block.

C. Parameter Initialization
One of the key steps in the blockchain verification protocol is the parameter initialization, i.e., selecting p and q to generate the modulus N . If p or q is exposed, the protocol is clearly not secure. This issue can be addressed by generating N using a multi-party protocol. There have been many protocols for this purpose. For example, the protocol proposed by Cocks [10] works as follows. Suppose at the beginning there are users who work together to generate the first block.
1) Each user i, 1 ≤ i ≤ , selects his/her own prime numbers p i , q i . 2) Each user i, 1 ≤ i ≤ , calculates N = (p 1 + p 2 + · · · + p )(q 1 + q 2 + · · · + q ). By leveraging the protocol given in [10], user i can calculate N without knowing the two factors of N . 3) Each user tests whether N is a product of two prime numbers or not. Specifically, the system randomly selects a random number x and each user calculates x pi+qi mod N . If x pi+qi mod N ≡ x N +1 mod N , N passes the test. Carmichael numbers that can pass this test can be further eliminated by methods given in [11]. 4) If the current N passes all tests, users work together to embedded it in the genesis block. Otherwise they repeat the process again, until an appropriate N is found. Since N only needs to be generated once, the cost of the parameter initialization is not a big concern.
D. Security and Performance of the Block Verification Protocol It is straightforward to verify that the protocol is correct, meaning that any legitimate proof will be accepted as valid.
The following theorem shows that for a given summary S of blockchain BC , no attacker can generate a valid proof for a forged block blk that is not contained in BC under strong RSA assumption.
Theorem 1: Given a summary S n of blockchain BC , there is no probabilistic polynomial-time attacker A that can forge a block blk and an accompanying proof P that blk is a valid block on blockchain BC in the random oracle model; otherwise, the Strong RSA assumption is broken.
Proof: Suppose hash() behaves like a random oracle. Let r i = hash(blk i ||i) where blk i is the i-th block on BC , and S n = g n k=1 r k mod N . We consider two scenarios of attacks: • The attacker knows the summary S n but not the blockchain. Suppose the attacker chooses blk and position i for the block. Then, the attacker needs to compute y ∈ Z * n such that y hash(blk ||i ) mod N = S n .
This immediately breaks the Strong RSA assumption. • The attacker knows both blockchain and the summary S n .
In this case, the attacker knows all valid proofs for blocks in BC , i.e., (r i , S 1 r i n mod N ), i = 1, . . . , n. Suppose the attacker can generate a valid proof for a forged block blk for some position i . Let r = hash(blk ||i ). If r | n i=1 r i , the attacker can successfully make a valid proof for blk at position i because the attacker can compute (r , S n i=1 ri/r n ). Because the attacker cannot control the output of hash(), the probability that the attacker can succeed is equivalent to the probability that a random number r is a factor of another random number R = n i=1 r i . According to Erdös-Kac theorem [12] and its extension counting multiplicities [13], the number of prime factors of R counting multiplicity is O(log log R). With Binomial theorem, the total number of divisors of R is O(2 log log R ) = O(log R), and lim R→∞ log R R = 0. Therefore, the probability that an attacker can find r is negligible when R is large enough. As long as the attacker cannot find such r , a successful attack implies that the the Strong RSA assumption is broken. In summary, there is no practical attack against the protocol in the random oracle model unless the Strong RSA assumption is broken.
Performance of the major algorithms is analyzed as follows.
• Block construction. When compared with the straightforward method by which each user keeps the entire blockchain, our method incurs some extra work in the block construction algorithm. The extra work consists of two parts: evaluating the hash value of the new block and calculating the new summary. The computation overhead is constant (i.e., one hash calculation and one modular exponentiation) and the storage overhead is also constant (i.e., an element in Z N for the summary). The summary also incurs extra communication cost, which is however small (e.g., 2048 bits for a 2048-bit N ). • Proof generation. The proof generation algorithm does not incur extra storage. The computational cost is proportional to the length of the current blockchain (i.e., the number of blocks in the chain) and the position of the block. Suppose the length of the blockchain is n, and the proof of i-th block needs to be generated, where i ≤ n.
The prover needs to conduct one hash evaluation of the ith block, and calculates the product of hash values of blocks 1, . . . , i − 1, i + 1, . . . , n. In summary, the prover calculates n + 1 − i hashes, n − 1 multiplications, and one modular exponentiation. Since the nodes with sufficient storage capacity (rather than the lightweight users) are supposed to generate proofs, the protocol is practical. • Proof verification. The computational cost to verify the proof of a block includes one hash evaluation and one modular exponentiation, which is constant. This explains why the protocol is suitable for lightweight users who only keep the summary of the blockchain.

E. Reducing Cost of Proof Generation
Although both the cost of updating the summary of a blockchain and the cost of verifying a block are constant, the computational complexity for the prover to generate a proof is O(n), where n is the number of current blocks on the blockchain (i.e., n keeps increasing). In the worst-case scenario, the prover needs to traverse all of the blocks on the blockchain to calculate the second part of the proof, namely g ( n k=1 hash(blk k ))/hash (blki) mod N.
In order to reduce the computational complexity incurred by this, we design a scheme that improves the computational efficiency at the price of a slight increase in storage. a) Proof generation with a smaller computational complexty.: The basic idea underlying the scheme is to let the prover maintain a binary tree T . As illustrated in Fig. 3, the binary tree is used to store intermediate results that can be used to generate a proof for a given block. Specifically, each leaf stores the hash value of a corresponding block, and each internal node stores the product of its two direct children nodes. This way, the root node stores the product of the hash values of all of the blocks on the blockchain. The height of T is pre-determined. If a leaf is empty (i.e., currently there is no corresponding block on the blockchain), its value is set to 1 so that it does not contribute to the value stored at the root node.
Suppose the height of tree T is h and the number of currrent blocks on blockchain is n, where n < 2 h−1 . To calculate a proof for block blk i , where 1 ≤ i ≤ n, the prover leverages the information stored in T as follows:  Fig. 3. The storage structure that can be used by a prover to reduce its computational complexity when generating proofs. Each leaf h i stores the hash value of a block, and each internal node stores the product of the values stored at its two children.
• Find the product of all of the values on the right-hand of blk i (the blockchain grows from left to right) Instead of conducting the multiplication operation oneby-one, the prover utilizes different products information stored in T to accelerate the computation. • Calculate LR ← (S i ) r/hash(blk i) mod N . • Set the proof as P ← (hash(blk i ), LR). Note that the height of T determines the number of blocks it can accommodate, and is therefore a pre-determined public parameter. If the height of T is h, the total number of blocks it can accommodate is 2 h−1 . This is no significant constraint because a relatively small h can accommodate a large number of blocks. For example, when h = 32, the structure can accommodate 4,294,967,296 blocks, which are about 9,000 times larger than the number of blocks on the Bitcoin network as of April 2017. b) Analysis of the improved scheme.: The improved scheme involves a binary tree T to store some information that can be used for generating proofs. Let height(T ) = h, meaning that n = 2 h−1 is the number of leaves. Let |hash()| = . At the leaf level (i.e., the first level), the size of each node is . Each node at i-th level incurs i · bits of storage, and the size of the root node is h · bits. Therefore, the size of T is n · first level + · · · + (n/2 i ) · (2 i ) i-th level With intermediate results stored in T , the computation complexity for generating a proof is reduced to h (or O(log n)) modular exponentiations.
More generally, if each internal node in Fig. 3 has m children, the height of T is reduced to log m n + 1. A similar analysis shows that the total size of T is (log m n + 1) · n · , which is the size of storage a prover keeps locally. In order to calculate r, which is defined in Equation (1), it requires about log m n + m multiplication operations in the worst-case scenario, where m is the number of multiplications incurred at an internal node at the second level of T . In order to select the value of m so as to minimize the overall computational complexity, we calculate the derivative as follows: (log m n + m) = ( ln n ln m + m) = 1 − ln n m ln 2 m , which monotonically increases with respect to m. Therefore, we get the minimum value when 1 = ln n m ln 2 m , and m ≈ ln n. In practice, we can set the number of branches to a small constant integer so as to reduce the computational complexity of the prover.

IV. USING THE BLOCK VERIFICATION PROTOCOL TO CONSTRUCT EPBC
In this section, we discuss construction of high-layer operations based on the verification protocol described in Section III. Specifically, we focus on two basic protocols: blockchain identification and transaction verification. a) Blockchain identification.: When a lightweight user needs to join a blockchain based application, it needs to obtain the current summary of the blockchain. Protocol 1 is for this purpose.
Protocol 1 Blockchain identification. 1: The lightweight user randomly selects a group of users, denoted by G u , from the blockchain network; 2: for all u ∈ G u do 3: The lightweight user queries u to get the summary value S (u) ; 4: The lightweight user interacts with u to verify the validity of S (u) with respect to a random set of blocks chosen by the lightweight user; 5: end for 6: The lightweight user calculates S ← SummaryDetermination(S (1) , . . . S ( ) ), which returns the summary that is provided by majority of the users, where S is the final summary of the blockchain; Note that as long as the attacker does not control majority of the users, the protocol is secure. The lightweight user can also adopt other strategies to determine the summary, e.g., giving different weights to selected users and include this information when making the decision. b) Transaction verification.: A transaction is valid if and only if the block it belongs to is accepted by the majority of users, i.e., on the longest branch of the blockchain. Therefore, verification of a transaction is reduced to checking the validity of a block and its position (i.e., block number). A lightweight user can use the block verification protocol to verify that the block in question indeed contains the transaction in question. Then, the lightweight user can check the number of blocks that have been added after the block that is verified. Similar to the Bitcoin system [1], if more than 6 blocks have been added to the blockchain after the block under consideration, the transaction in question can be accepted with high confidentiality.
If the transaction is a smart contract submission or onetime smart contract execution result submission, the above method is also sufficient. However, if the transaction is a payment operation or submission of multiple-time smart contract execution result, freshness becomes a concern. For example, the attacker can provide proof of an old block that contains previous payment of the same value. To prevent such attacks, the lightweight user can maintain a local counter and include the counter in its transactions.

V. INTEGRATION WITH EXISTING BLOCKCHAIN SYSTEMS
Because a lot of public blockchain applications have been developed, it is useful to enable EPBC for these systems without modifying existing data structures and client. To achieve this goal, EPBC can work as a separate service layer on top of existing blockchain systems. Fig. 4 demonstrates the relationship between the existing blockchain system and the newly added EPBC service.
Specifically, a separate EPBC client with embedded parameters can be distributed to users who maintain the blockchain and play the role of a prover. Here parameters are values that used for blockchain summary construction. Summaries of the blockchain are not involved in mining, and users can use existing client to produce new blocks and achieve consensus on the blockchain. After the user decides to accept a new block, the EPBC client produces a new summary based on previous summary value and the new block, and stores the new summary locally. Note that summaries are determined by the blockchain itself so EPBC client does not need to run any consensus mechanism. If the user wants to reduce the time complexity of generating a proof, EPBC client can maintain the tree structure described in Section III-E.

VI. EXPERIMENTS AND EVALUATION
In this section, we describe the implementation and provide preliminary experimental results of EPBC. We focus on the block verification protocol because it is the core of EPBC.
a) Implementation and parameters.: We implemented a prototype of the block verification protocol based on the MIRACL crypto library [14]. Since security of the protocol depends on the Strong RSA assumption, we chose a 1,024 bits N in the implementation. SHA256 was used for hash(). We also set the height of T as 32. When a leaf is empty, its value is set to 1 and there is no need to store it.  Fig. 5, which shows that although the cost of proof generation depends on the size of the blockchain, the cost of proof verification is independent of the blockchain size. As discussed in Section IV, some high-level operations like balance checking require the lightweight client to verify more than one blocks. This is not a problem in practice for the user using lightweight client because it only takes about 0.02 second to verify one block.

VII. RELATED WORKS
EPBC only provides the mechanism for checking the validity of a given block and the transactions contained in the block. It does not consider how to determine which block(s) should be checked. It is proposed in BIP 37 to use a bloom filter to select potentially related blocks for verification [16]. The Bitcoin community proposes the UTXO (unspent transaction outputs) technology, which requires the user to store unspent transaction output information instead of transaction information. This reduces the storage cost but does not change the order of storage complexity [17].
Cryptographic accumulator was first developed by Benaloh and De Mare to achieve decentralized digital signature [5]. Barić and Pfitzmann developed a collision-free accumulator and used it for fail-stop signatures without using any tree structure [18]. Cryptographic accumulators are useful (e.g., constructing group signatures [19]). Dynamic cryptographic accumulator can further support adding/removing members [20]. These schemes do not consider features of blockchains, namely that every user has the privilege to construct blocks and generate proofs and lightweight users have very limited computational capability. Recently, e-cash systems such as ZeroCoin also utilizes cryptographic accumulators, but for a different purpose of information hiding [21].
Another line of related research is storage verification in the cloud environment, and several related concepts were proposed, e.g., provable data possession [22] and proof of retrievability [23]. These schemes cannot be applied in our scenario because the lightweight users do not know the blockchain in advance and the blockchain keeps growing as new blocks are created and appended to it.
Both EPBC and SPV assume the records that are embedded into blocks are correct if the corresponding blocks are valid. Some techniques that are applicable to SPV, such as bloom filter [24], are also applicable to EPBC. Nevertheless, EPBC incurs only a constant amount of storage for the lightweight client, assuming the client cares about most recent transactions. This is significant because storing several block headers might be cheaper than storing the summary value.

VIII. CONCLUSION
We have presented EPBC, a scheme for lightweight users to use blockchain-based applications without storing the entire blockchain while still able to verify the validity of blocks and transaction. The basic idea is to "compress" a blockchain to a constant-size summary, which is the only data item a lightweight client needs to keep. We analyzed the security of EPBC and preliminary experiments showed that it is practical. EPBC can be adopted for blockchain-based applications, such as e-cash and smart contract systems.