Outsourced Auditing With Data Integrity Verification Scheme (OA-DIV) and Dynamic Operations for Cloud Data with Multi-Copies

Data storage in cloud is widely utilized by different commercial, educational, scientific, healthcare applications and much more. The major concern apart from data security is data integrity. Our work focuses on data integrity. In spite of the presence of Service Level Agreement (SLA) between the data owner and CSP, the data integrity may be affected and also todays’ cloud computing platform processes more of real-time data, which involves dynamic data operations. Most of the existing works intend to verify the cloud data integrity however the issues related to data replicas are not usually considered and most of the existing cloud techniques fail to focus on dynamic data operations. This article presents an outsourced auditing with data integrity verification scheme (OA-DIV) scheme that can handle multiple copies of cloud data and supports dynamic data operations such as data insertion, deletion and updation. The performance of the proposed work is tested with respect to communication cost and processing time, while comparing the results with the existing approaches.


Introduction
Cloud storage service brings numerous attractive benefits to the organizations and other business communities, such that it is employed widely. Though the hassles involved in data storage are addressed by cloud service provider, the major challenging concerns are data integrity and privacy. The data owners are concerned about the consistency of the outsourced data, as any modification or tamper may lead to serious problems, in addition to data recovery. There are several other related problems such as maintaining the data integrity of multiple copies of cloud data and performing dynamic operations over cloud data.
Hence, the data owners perform data integrity checks then and there, in order to confirm the correctness of the data. There are various techniques in the existing literature for checking the integrity of data. One of the famous verification techniques for cloud data integrity is the Provable Data Possession (PDP) [1,2]. The PDP based techniques employ the sampling process for data integrity verification and the data inconsistency can immediately be detected without the need of downloading the complete data. However, this method cannot perform well when the data volume is more and incurs high computational cost.
Proof of Retrievability (PoR) is another technique, which helps to verify the data integrity and based on this technique, the authors of [3,4] proposed a data proof with retrievability. However, this approach requires 'sentinels' and can provide a static count of audits alone. Additionally, most of the existing works focus on audit operation alone and the data organization or the data duplicate management are rarely discussed.
In order to increase data availability, multiple copies of the outsourced data are stored in the remote cloud servers that exist in different geographical locations. The main challenge associated with multiple data copies is that when the parent data is modified, all the replicas are needed to be modified. Failing to achieve this leads to data inconsistency, which seriously affects the performance of the cloud system. The real-time scenario demands the cloud environment to be more realistic by providing support to dynamic operations that includes data insertion, updation and deletion [5][6][7][8][9][10][11]. However, most of the existing literature are based on static data and do not support dynamic operations. The proposed work intends to address this issue by presenting a scheme for cloud computing environment, which can handle dynamic data operations such as data insertion, deletion and updation [40][41][42][43][44][45].
Understanding the necessity of data organization and duplication management while performing data audit, this paper presents a OA-DIV scheme that focuses on both the above mentioned issues. The work highlights of this paper are listed as follows.
• This paper considers multiple replicas of the cloud stored data. • The data organization is taken care of by means of tree based structure. • Perform the dynamic operations such as data insertion, deletion and updation. • The audit effectiveness of the work is proven to be better than the compared schemes.
The remainder of this paper is arranged in the following manner. Section 2 discusses the related review of literature, section 3 presents the proposed auditing scheme and section 4 presents the dynamic operations. The performance of the proposed work is tested in section 5 and the conclusions are summarized in section 6.

Literature Review
This section aims to review the existing literature with respect to data integrity preservation and dynamic operation in cloud computing environment.
In [11], a user revocation scheme is presented for identity based cloud storage auditing meant for big data. This work is based on effective key generation and the updation of private keys. The user revocation is carried out by upgrading the private keys of non-revoked users and not considering the authenticators. This work does not require any complex certificate management schemes, as in the case of standard Public Key Infrastructure (PKI) systems.
A data integrity scheme for cloud computing environment is presented on the basis of distributed machine learning in [16]. Initially, a Provable Data Possession (PDP) sampling based auditing algorithm is presented for verifying the data integrity. A random number is then generated and the Discrete Logarithm Problem (DLP) is utilized for proof formation. Finally, an identity based cryptography is employed for generating the public/private keys.
In [13], a trustworthy data integrity is presented for cloud storage system by employing collaborative auditing block chain mechanism. The authors of this work intend to handle the trust issue between the data owners and Cloud Service Providers (CSP). Here, all the nodes perform auditing assignments and store them. An attribute based cloud data integrity auditing scheme is proposed in [14] for cloud storage. This work defines specific attribute set for enabling the users to upload data. This work claims to provide privacy preservable attributes and avoids collusion attacks.
A data integrity auditing scheme based on algebraic signature is presented in [15]. This work focuses on confidentiality through batch auditing process. This work supports dynamic operations on data as well and the security analysis of the work is performed. In [16], a privacypreserving audit scheme is proposed for cloud computing, which also can support dynamic data operations of Unmanned Aerial Vehicles (UAV). This technique works on the basis of distributed string equality check protocol in addition to Merkle hash tree. Initially, a Third Party Auditor (TPA) is included that takes care of digital sign, integrity checks and dynamic data operations. The security is further improved by means of equality check protocol.
In [17], a publicly verifiable with data deletion scheme is proposed for cloud computing. This work focuses on public and private verifiability of the data storage and deletion with the help of invertible bloom filter. Suppose, when the cloud server does not work normally, then the users can detect the abnormality with the probability. The auditing services for cloud storage systems are presented on the basis of Merkle hash tree in [18]. This work detects the suspicious entities, while supporting the dynamic updation.
An integrity check method is performed on cloud storage with multi-copy data in the work of [19]. The key generation process is performed by bilinear mapping and the multi-copy data is handled by multi-branch authentication tree. The correlation of tasks is represented by Directed Acyclic Graph (DAG) by considering the Quality of Service (QoS) demand.
In [20], a lattice based verifier auditor scheme is presented for electronic medical data in cloud storage. This lattice based scheme allows a patient to employ a verifier for performing data audit. A Provable Data Possession (PDP) protocol without certificate is presented for handling multiple copies in cloud is proposed in [21]. This work presents an identity based PDP protocol that can maintain multiple data copies on multiple cloud servers. The security model is built initially and the protocol is formed.
In [22], a lightweight verifiable data aggregation that ensures privacy preservation is proposed. Pailier homomorphic encryption scheme with signature technique is proposed for privacy preservation. The data integrity is provided by q-Strong Diffie Hellman (q-SDH) assumptions. A PDP is presented for dynamic multiple copies of data in cloud storage [23]. The vector dot products are utilized in PDP and the dynamic data structure is presented on the basis of Divided Address Version Mapping Table (DAVMT). A certificate based auditing scheme is presented on the basis of asymmetric bilinear pairing in [24] for cloud storage. This work is implemented on type D curve of pairing library and the authors claim that the tag generation cost is minimized. In [25], a data integrity scheme is proposed on the basis of short signature for cloud based Internet of Things (IoT). The utilization of short signature algorithm verifies the data integrity and preserves privacy, while the public auditing is performed by TPA.
In [26], a data integrity verification scheme is proposed on the basis of cryptographic accumulator. This work permits the data owner to perform several rounds of integrity checks. Dynamic data operations are supported and data leakage, replay attacks are tackled by this work. In [27], a dual access control and data integrity verifiable scheme is proposed for cloud computing applications. A time tree is formed for attribute based encryption by employing hierarchical identity based encryption for fixing the access and decryptable time for the keys and data. The data decryption is possible upon the meet of access conditions of data owner. The data verification tree is built by Merkle hash tree.
An identity based data integrity audit with sensitive data hiding scheme is proposed in [28]. Here, all the sensitive data blocks are sanitized and the signatures of these blocks are transformed. The signatures are employed for verifying the integrity of data. In [29], a fuzzy identity based data auditing method is proposed. This work is based on Merkle hash tree and Index Logic Table (ILT), which can support dynamic operations.
The authors of [30] improved the basic scheme on the basis of the PDP approach and suggested an accurate and stable PDP scheme. This work employs symmetrical encryption and protection under the Random Oracle model could be verified, including dynamic data operation, i.e. data changes, such as deletion and attachment. The user sets the total count along with the challenge content and saves the answer as metadata, when the scheme is initialised. However, public auditing is not maintained in this approach, because of the use of symmetric cryptography. Despite the improvisation, this work has rendered little focus on storage and computing performance, as well as lots auditing and privacy.
The authors of [31] suggested two complex PDP schemes by introducing a PDP scheme that could support all dynamic data operations. Initially, this work employs a level authentication jump table and the second is on the basis of RSA tree structure. The objective is to maintain dynamics, especially insertion and so this scheme focuses to compute over high-cloud servers. Yet, this work focuses on the efficiency of contact and dynamic operation alone.
A data protection and public audit system that would be more useful in practise in the area of computation and overhead communication is discussed in [32]. Asymmetric pairing was used to improve performance and also to support complex operations in their scheme. Their scheme demonstrated increased productivity compared to previous works. This paper does not address problems such as storage performance, number of verifications, batch auditing and proven protection.
In [33], a technology known as a balanced update tree is utilized. In terms of the parameters, the authors concentrated on the process of the data update. The average measurement and coordination of this scheme is minimal than the existing schemes.
In [34], the authors suggested the implementation of a stable signature system and the broad branching tree for a new dynamic PDP scheme (LBT). Their framework provides support to complete dynamic modifications with adjustment, insertion and removal. By modifying the Merkle Hash Tree (MHT) in association with LBT, the effectiveness of the communication costs is increased. They use a safe signature algorithm that significantly reduces both CSP and client computing costs.
The authors of [35] suggested the first dynamic POR system in order to handle dynamic data safely. This work presents a new POR method, which inherits the dynamic data setup, known as fairness. Due to confusion, fraudulent consumers are genuinely able to exploit their data with truthful cloud storage servers. They also founded two new tools, one is a 2-3 tree authentication data structure (rb23Tree), and the other the incremental signature method, which is called Hash-Compress-And-Sign (HCS). Help for dynamic activity is the subject of this paper.
The issue of performance is focussed in [36]. They also expanded the static POR to a dynamic framework, where a consumer can make changes that include insert, delete and change. The authors built a fresh version of a B+ with a Merkle hash tree authenticated data structure called Cloud Merkle B + tree (CMBT), while presenting a dynamic POR scheme in combination of the CMBT with BLS signature.
In [37], a dynamic storage POR scheme is suggested that allows clients to read and write arbitrarily at any server location by using a protocol efficient and performing audit protocols to ensure a server maintains the latest data edition. The complexity of computation and communication in their protocol was only multi-log data size. The concept is to divide the data into several tiny blocks and to encrypt every block separately, so that the internalisation of the data block would only affect those code character numbers. Computing and coordination complexity were significantly reduced in their protocol. Other requirements were not considered except for computational efficiency and complex operational support.
The efficiency is addressed in dynamic POR scheme using homomorphic controls with constant customer storage in [38]. They also demonstrated how their scheme can be tested publicly. In [39], a new cloud storage approach called OPoR is proposed to improve the POR model to help complex data operations and withstand cloud storage server reset attacks during the upload process. OPoR supplies heavy tag generation calculations to the cloud audit server and removes user participation in the audit and pre-processing phases. However, this article considers the data dynamics and safety. Motivated by these works, this article presents a data integrity verification scheme that can handle multiple copies of the cloud data along with the dynamic data operation.

Proposed OA-DIV Cloud Data Auditing Scheme
The proposed OA-DIV cloud auditing scheme is elaborated in this section, which includes all the involved modules and sub-modules. This work splits the work into three modules such as Cloud Data Owner (CDO), Third Party Auditor (TPA) and CSP. For each module, certain sub-modules are assigned and are overviewed in this section. The CDO module is comprised of phases such as key establishment, data copy creation, signature creation and so on. The auditing strategy formation and integrity challenge placement come under the control of TPA. CSP responds to the challenge with the proof. The overall flow of the work is shown in figure 1. All these modules are outlined in the following sub-sections.

CDO
The CDO is the owner of the outsourced data and is the most concerning entity regarding the data integrity and security. The integrity of the data is maintained as per the Service Level Agreement (SLA), however in certain cases the SLAs are violated by the CSP. Thus, there is a need to check the data integrity at timely intervals. The CDOs are responsible for carrying out the following.
• Key Establishment -The CDO runs the key establishment algorithm for providing the public (pb) and private (pr) keys. • Data Copy Creation -The copy of data is created for ensuring better data availability and this algorithm generates the copy. Whenever a data file (D) is passed as input, a group of copied data blocks (D') are formed. • Signature Creation -The signature creation algorithm is meant for creating signature blocks in order to provide authentication. The inputs of this algorithm are pr and D, while it returns the signature block β and the signature value of the root SV. This processed data block D along with β and SV are passed to the storage server.

TPA
TPA is responsible for checking the integrity of data by passing a challenge to the CSP, in order to verify the data correctness. The activities performed by the TPA are forming the audit strategy formation and integrity challenge placement.
• Audit strategy formation -This algorithm forms the audit strategy, which is being executed by the TPA. This algorithm accepts a set of data attributes DA and returns audit strategy QA, which includes the probability of file tamper (pb), data access rate (ar), timestamp (TS). • Integrity challenge placement -The challenge details are formed by this phase, which accepts the data block group, which is needed to be tested and a random number RN=(i,q_i). The challenge message is forwarded to the cloud storage server.

CSP
The CSP is the entity that provides the user with the cloud storage space that follows SLA. Whenever the CSP is challenged by the CDO for the proof of data integrity, the CSP has to respond with the proof of integrity.
• Proof Formation -This phase intends to present the data integrity proof for the data being challenged. The input of this phase are the data D', β and CHAL, for which the algorithm returns the proof P. As soon as the proof is received, the TPA verifies the P and returns with either TRUE or FALSE. The value TRUE indicates the proof is verified and vice-versa. Hence, the CDO generates β and the root node's signature, where every parent node authenticates its child node and the child node authenticates the data. This idea makes it simple to maintain the data replicas and to check the integrity of the data. All the phases are explained as follows. Compute 0 = 1 0 , ∈ 2 , = {1,2, … , }, in order to public keys and private keys denoted by = ( , 0 , 1 , … , ), = ( 0 , 1 , 2 , … , ).

Proposed Work
The data (D) is encrypted by applying the elegant and mathematically efficient cryptographic Advanced Encryption Standard (AES) algorithm and the data blocks ( ) are formed upon the encrypted data ( ( )), such that the equation for encrypted data ( ) = ( 1 , 2 , … , ). The CDO creates number of replicas for the data, which is represented by the following.
Random masking technique is then applied for altering the data and the condition is given as follows. Here, This value is formed by the pseudo random number generator. All the data copies are maintained by a tree structure and when the challenge is forwarded to the CSP, the following actions take place.
Whenever a CHAL message is received by the CSP, the children nodes of a specific data blocks are checked and certain information are obtained.
This is followed by the detection of the copies of the specific data file 1 ′ and the value of root node 0 � . The CSP then passes the CHAL message to the servers ( − 1) that hold copy of the data, which is as follows.
The ℎ storage server is responsible computes the value ′ and the proofs ℳ ′ and ′ are formed for .
The proofs of all the data copies { ′ , ℳ ′ , ′ } are forwarded to the central server that has passed the challenge request. As soon as the proof is obtained, the correctness of the data is checked by { ′ }.
The central server verifies the condition of the equation and computes the following.
Suppose when these equations do not satisfy, then it is declared that the data integrity is affected. The verification evidence is computed by the following.
The corresponding server sends the computed proof = { 0 , ℳ, } to the TPA. By this way, the integrity of the data is checked and the organization of the replicas helps in carrying out dynamic operations.

Dynamic Operations over Outsourced Cloud Data
This section elaborates the proposed approach that supports dynamic operations such as data insertion, deletion and modification over outsourced cloud data. The dynamic operations are described one after the other with the proposed algorithms in the coming sub-sections.

Data Insertion (DI)
This process intends to insert a fresh data block to the existing data at a particular location of data file ' ' and the process of data insertion does not affect the logical structure of the original data. Whenever the data owner requires to insert a new data block after the specific block of data , the following steps are followed. The overall algorithm of data insertion operation is shown as follows.
Initially, the data owner creates the new version of data along with the time stamp ( ), which is represented as follows.
Where, DO is the data owner that sends the 'Data Insert' request to the Third Party Auditor (TPA), which includes the new version of the data ( ) along with the timestamp of the new data block ( ). Figures 2 and 3 show the before and after structure of data insertion operation.  The TPA then takes action to insert the new data at the completion of data file DF by creating a new data block and increments the pointer by one. The user creates a new signature for the new data block and forwards the data update request to the CSP, which is represented as follows. As soon as the receipt of this request from the CSP, the CSP creates the fresh version of this file, which is represented as along with the tag set = �| |� ( || ). The signature is composed of a newly formed file, total count of data blocks and the parameters are signed by the secret key. Hence, the process of data insertion is discussed in this section and the next section presents details of data deletion.

Data Deletion
When a specific data block has to be deleted, then it succeeding blocks after the processing blocks are needed to be moved. The data deletion procedure of this work is presented as follows. When the data owner wishes to delete a specific data block from the data file , the following request is sent to the TPA.
Here, the data deletion ( ) request is sent to the TPA, which consists of the data file ( ) and the specific data block to be deleted ( ). In this case, after performing deletion, the pointer is decremented by one. The updated file is then reflected on the cloud server, when the DO forwards the update request to CSP, which is as follows. The data deletion algorithm is presented as follows.

Sl.No
Block ID Db i TS i The cloud server updates the new version of the file, while generating the following tag set.
The signature contains the newly updated file that reflects the carried out deletion with the total count of data blocks and the parameters signed with the secret key. Hence, the process of data deletion is explained and the next section considers the process of data updation.

Data Updation
Data updation is the most important dynamic operation among all the operations, which replaces a particular block with another block. When the data owner intends to update a specific data block with another block, then the update request is forwarded to the TPA. The algorithm for data updation is presented as follows.

Algorithm 3 Data Updation
This data updation ( ) request consists of the file to be updated ( ), block number ( ), block update ( ′ ) and timestamp of the data block ( ′ ). As soon as the update request is obtained from the data owner, the corresponding data file is retrieved and the specific data block is located. The data update request is sent to the CSP, which is represented as follows.
The data owner generates a new signature ′ and the update request is forwarded to the CSP, which is as follows. The CSP performs the replacement operation of the old file to the new file ( ′ ). The tag generation is represented below.
Hence, the dynamic operations of data insert, delete and update are presented in this section and the next section discusses the experimental results attained by the proposed work and presents the results achieved by the proposed approach.

Results and Discussion
The performance of the proposed OA-DIV work is simulated with the help of Eucalyptus tool, which is an open source. The Eucalyptus fast version 3.4.1 is employed for a standalone computer with 500 GB hard disk and 8 GB RAM. The experimental results attained by this work are compared with the existing approaches and the results are discussed in this section. The comparison is performed in terms of communication cost, processing time and storage cost. From figures 4 to 6, the communication cost, processing time with respect to block size and verification phase of the proposed work are presented. The results show that the proposed data integrity verification scheme consumes minimal communication cost and processing time than the compared approaches. The main reason for attaining better results is the better organization of the data blocks, data copy creation and management. The communication cost of the proposed scheme meant for verification phase is shown in Table 1. Here, n is the total count of blocks in data and vb is the count of data blocks.
The effectiveness of the dynamic operation is examined with respect to the data storage and retrieval time.
The results indicate the data saving time is lesser, when compared to the data retrieval time. The reason for this is that the data are segregated and stored in different cloud servers. Figure 7 shows the time consumption of data storage and retrieval time.
The data retrieval time is measured by retrieving the complete data and thus, the data retrieval time is more. Besides this, while retrieving the data the sequence has to be maintained. On the other hand, when a particular data block alone is retrieved, the retrieval is done in a streak. Both data storage and retrieval time increases with respect to the count of data blocks. The results of the time consumption for data block insertion and deletion are shown in figures 8 and 9 respectively.  The analysis varies the volume of data from 5 to 20 GB and the time consumption is analysed. The experimental analysis confirm that the block insertion and deletion operations consume more time with respect to the size of data, yet is reasonable.
The security analysis of the proposed work is discussed as follows.

Security analysis
It is technically difficult to forge the proof generated by the TPA.
Proof: When the challenge is placed by the TPA, the proposed work considers all the replicas of the data and the challenge is passed to all the servers with = { , , } 1 ≤ ≤ ,2≤ ≤ . The CSP responds to the challenge with the proof = { 0 , ℳ, }. When the equation holds, the data integrity is proven to be maintained, else it is not. Hence, the proposed work focuses to maintain the data integrity of the replicas as well and it is difficult to forge the TPA's proof. On the whole, the proposed work is proven with the better verification scheme for maintaining the integrity of data. The conclusions of this work are presented as follows.

Conclusion
This article presents an OA-DIV Outsourced auditing with data integrity verification scheme for cloud computing environment, which even considers the data copies for integrity check. This work relies on three modules such as client, Third Party Auditor (TPA) and CSP. The client uploads the encrypted data blocks to the CSP. The copies of the data are created and maintained in different cloud storage server.
Whenever a challenge message is submitted to the CSP, all the servers that possess the copy of data are also challenged and the integrity of all the copies is also verified. The copies are organized in a tree-like structure, such that it supports to carry out dynamic operations over outsourced cloud data. The dynamic operations include data insertion, deletion and modification. This work handles these dynamic operations by segregating the data into several data blocks, which makes the process easier and efficient. The effectiveness of the proposed work is assessed in terms of time consumption, data storage and retrieval time. The results indicate the effectiveness of the proposed work and this approach can be extended by considering the real-time cloud environment.