CPAR: Cloud-Assisted Privacy-preserving Image Annotation with Randomized KD-Forest

With the explosive growth in the number of pictures taken by smart phones, organizing and searching pictures has become important tasks. To efficiently fulfill these tasks, the key enabler is annotating images with proper keywords, with which keywords-based searching and organizing become available for images. Currently, smart phones usually synchronize photo albums with cloud storage platforms, and have their images annotated with the help of cloud computing. However, the"offloading-to-cloud"solution may cause privacy breach, since photos from smart photos contain various sensitive information. For privacy protection, our preliminary research made effort to support cloud-based image annotation on encrypted images by utilizing cryptographic primitives. Nevertheless, for each annotation, it requires the cloud to perform linear checking on the large-scale encrypted dataset with high computational cost. This paper addresses the challenge and proposes a cloud-assisted privacy-preserving image annotation with randomized kd-forest, namely CPAR. A novel privacy-preserving randomized kd-forest structure is proposed in CPAR as a secure and efficient index for the dataset, with which CPAR significantly improves the annotation performance. Thorough analysis is carried out to demonstrate the security of CPAR. Experimental evaluation on the well-known IAPR TC-12 dataset validates the efficiency and effectiveness of CPAR.


I. INTRODUCTION
T HE widespread use of smart phones causes photography boom in recent years.According to a recent report from Forever's Strategy & Business Development team [2], the number of photos taken by smart phone was over 4 trillion in 2017 and the total number of photos to be taken in 2018 is estimated to be 8.8 trillion.To facilitate the storage of photos, majority of smart phones today are synchronizing their photo albums with cloud storage, such as Apple's iCloud, Samsung Cloud, and Google Photos.Besides the storage service, these cloud storage platforms also help annotate users' photos with proper keywords, which is the key enabler for users to perform popular keywords-based search and organization over their photos.Although the cloud storage offers a set of decent features, it also raises privacy concerns since many users' photos may contain sensitive information, such as personal identities, locations, and financial information [3].To protect the privacy of photos, encrypting them with standard encryption algorithms, e.g., AES, is still the major approach for privacy protection in cloud storage [4], [5].However, The preliminary version of this paper appeared in the 5th IEEE Conference on Communications and Network Security (IEEE CNS 2017) [1].
Yifan Tian and Jiawei Yuan are with the Department of EC-SSE, Embry-Riddle Aeronautical University, e-mail: tiany1@my.erau.edu,yuanj@erau.edu.Yantian Hou is with the Computer Science Department, Boise State University, e-mail: yantianhou@boisestate.eduthis kind of encryption also sacrifices many other attractive functionalities of cloud storage, especially for keyword-based search and management for imagery files.
In order to enable keywords-based search and management on encrypted data in cloud, keywords-based searchable encryption (SE) has been widely investigated in recent years [6]- [11].A SE scheme typically provides encrypted search indexes constructed based on proper keywords assigned to data files.With these encrypted indexes, the data owner can submit encrypted keywords-based search request to search their data over ciphertexts.Unfortunately, these SE schemes all assume that keywords are already available for files to be processed, which is hard to be true for photos taken by smart phones.Specifically, unlike text files that support automatic keyword extraction from their contents, keywords assignment for imagery files relies on manual description or automatic annotation based on a large-scale pre-annotated image dataset.From the perspective of user experience, manually annotating each image from users' devices is clearly an impractical choice.Meanwhile, automatic image annotation that involves large-scale image datasets is too resource-consuming to be developed on smart phones.Although currently several cloud storage platforms offer image annotation services [12], [13], these platforms require access to unencrypted images.Therefore, how to provide efficient and privacy-preserving automatic annotation for smart phones' photos becomes the foundation of applying SE schemes on them for further search and management.To address this problem, the preliminary version of this paper made the first research attempt and proposed a scheme called CAPIA [1].By tailoring homomorphic encryption over vector space, CAPIA offloads the image annotation process to public cloud in a privacy-preserving manner.Nevertheless, for every single annotation request, CAPIA requires the linear processing of all encrypted records in a large-scale dataset, which hence becomes its performance bottleneck for practical usage.
Different from CAPIA that has to check all encrypted records for each annotation request, this paper proposes CPAR that integrates randomized kd-forest [14], [15] as search index to promote annotation efficiency in a privacy-preserving manner.To be specific, CPAR designs privacy-preserving comparison schemes for L 1 distance (P L1C) and Kullback-Leibler (KL) Divergence (P KLC), which are further combined with order-preserving encryption to support all operations required in image annotation and randomized kd-forest.Our P L1C, P KLC, and privacy-preserving randomized kd-forest design can also be used as independent tools for other related fields, especially for these require similarity measurement on encrypted data.Moreover, as the same keyword may have different importance for the semantic description of different images, CPAR also introduces a privacy-preserving design for real-time keywords ranking.To evaluate CPAR, thorough security analysis and numerical analysis are carried out.Then, we implement a prototype of CPAR and conduct extensive experimental evaluation over the well-known IAPR TC-12 dataset [16].Our evaluation results demonstrate that CPAR can speed up CAPIA for more than 10× and 23× while achieving about 90% and 80% accuracy of CAPIA respectively.
The rest of this paper is organized as follows: In Section II: we present the system model and threat model of CPAR.Section III introduces backgrounds of automatic image annotation and technical preliminaries for CPAR.The detailed construction of CPAR is provided in Section IV.We analyze the security of CPAR in Section V. Section VI evaluates the performance of CPAR.We review and discuss related works in Section VII and conclude this paper in Section VIII.

A. System Model
As shown in Fig. 1, CAPR is composed of two entities: a Cloud Server and a User.The user stores his/her images on cloud, and the cloud helps the user to annotate his/her images without learning the contents and keywords of images.In CPAR, the user first performs a one-time system setup that constructs an encrypted randomized kd-forest with a pre-annotated image datasets.This encrypted randomized kdforest is offloaded to the cloud server to assist future privacypreserving image annotation.For resource-constrained mobile devices, this one-time setup process can be performed using desktops.Later on, when the user has a new image to annotate, he/she generates an encrypted request and sends it to the cloud.After processing the encrypted request, the cloud returns ciphertexts of top related keywords and auxiliary information to the user.Finally, the user decrypts all keywords and ranks them based on their real-time weights to select final keywords.

B. Threat Model
In CPAR, we consider the cloud server to be "curiousbut-honest", i.e., the cloud server will follow our scheme to perform storage and annotation services correctly, but it may try to learn sensitive information in user's data.The cloud server has access to all encrypted images, encrypted image features, encrypted keywords, encrypted RKDF, the user's encrypted requests, and encrypted annotation results.We also assume the user's devices are fully trusted and will not be compromised.The research on protecting user devices is orthogonal to this work.These assumptions are consistent with major research works that focus on search over encrypted data on public cloud [8]- [11].CPAR focuses on preventing the cloud server from learning following information: 1) contents of the user's images; 2) features extracted and keywords annotated for each image; 3) request linkability, i.e., tell whether multiple annotation requests are from the same image.

A. Image Feature Extraction
In this paper, we adopt global low-level image features as that are utilized in the baseline image annotation technique [17], because it can be applied to general images without complex models and subsequent training.Color features of an image are extracted in three different color spaces: RGB, HSV, and LAB.In particular, RGB feature is computed as a normalized 3D histogram of RGB pixel, in which each channel (R,G,B) has 16 bins that divide the color space values from 0 to 255.The HSV and LAB features can be processed similarly as RGB, and thus we can construct three feature vectors for RGB, HSV and LAB respectively as V RGB , V HSV , and V LAB .Texture features of an image are extracted using Gabor and Haar wavelets.Specifically, an image is first filtered with Gabor wavelets at three scales and four orientations, resulting in twelve response images.Each response image is then divided into non-overlapping rectangle blocks.Finally, mean filter response magnitudes from each block over all response images are concatenated into a feature vector, denoted as V G .Meanwhile, a quantized Gabor feature of an image is generated using the mean Gabor response phase angle in nonoverlapping blocks in each response image.These quantized values are concatenated into a feature vector, denoted as V GQ .The Haar feature of an image is extracted similarly as Gabor, but based on differently configured Haar wavelets.HaarQ stands for the quantized version of Haar feature, which quantizes Haar features into [0,-1,1] if the signs of Haar response values are zero, negative, and positive respectively.We denote feature vectors of Haar and HaarQ as V H and V HQ respectively.Therefore, given an image, seven feature vectors will be extracted as For more details about the adopted image feature extraction, please refer to ref [17].

B. Integer Vector Encryption (IVE)
In this section, we describe a homomorphic encryption scheme designed for integer vectors [18], which will be tailored in our construction to achieve privacy-preserving image annotation.For expression simplicity, following definitions will be used in the rest of this paper: • For a vector V (or a matrix M), define |max(V)| (or |max(M)|) to be the maximum absolute value of its elements.
• For a ∈ R, define a to be the nearest integer of a, a q to be the nearest integer of a with modulus q.
• For matrix M ∈ R n×m , define vec(M) to be a nmdimensional vector by concatenating the transpose of each column of M. Encryption: Given a m-dimensional vector V ∈ Z m p and the secret key matrix S ∈ Z m×m q , output the ciphertext of V as where S −1 is the inverse matrix of S, T is the transpose operator, e is a random error vector, w is an integer parameter, q >> p, w > 2|max(e)|.Decryption: Given the ciphertext C(V), it can be decrypted using S and w as V = (SC(V)) T w q .Inner Product: and their corresponding secret keys S 1 and S 2 , the inner product operation of V 1 and V 2 over ciphertexts can be performed as To this end, vec(S T 1 S 2 ) becomes the new secret key and More details about this IVE encryption algorithm and its security proof are available in ref [18].

C. Order-Preserving Encryption (OPE)
Order-preserving symmetric encryption (OPE) is a deterministic encryption scheme whose encryption function preserves numerical ordering of the plaintexts.Given two integers a and b in which a < b, by encrypting with OPE, the order of a and b is preserved as OP E(a) < OP E(b).More details about this OPE encryption scheme and its security proof are available in ref [19], [20].

A. Scheme Overview
The core idea of automatic image annotation is built on the hypothesis that images contain similar objects are likely to share keywords.The distance between the feature vectors of two images is used to measure the probability that they contain similar objects [17].Given a large-scale pre-annotated image dataset, the annotation process for a new image can be treated as a process of finding a set of images with shared objects and transferring keywords from those images.As a result, the annotation efficiency becomes heavily dependent on the performance of finding the image with shared objects.To boost the search efficiency, CPAR adopts randomized kdforest as the searching index [14], [15].In addition, novel privacy-preserving schemes are designed to address the privacy concerns when integrating the randomized kd-forest into CPAR.Different from many other index structures that are only efficient for low-dimensional data, Randomized kdforest (RKDF) is featured by its performance in handling high-dimensional data.In CPAR, data vectors are over 1300dimension and thus making RKDF an effective selection.
As depicted in Figure 2, a RKDF is composed of a set of parallel kd-trees.For each Node i in a kd-tree [21], it stores a feature vector V i of dataset image I i .In addition, each nonleaf node also stores a split field s i to generate a hyperplane that divides the vector space into two parts.Each Node j in left sub-tree of Node i has Node j [s i ] ≤ Node i [s i ] and vice versa, as described in ref [21].To search nodes that store vectors with top-smallest distances to a request vector V req , a parallel search among all trees in the forest is performed.Specifically, each tree is traversed in a top-down manner by comparing the split field values of V req and the vector V i stored in each Node i as an example shown in Fig. 2(a).The traversal selects the left branch to continue if V req [s i ] ≤ V i [s i ] and vice versa.Once the traversal reaches a leaf node, the vector stored in that leaf node is pushed into a priority queue Queue as a current close candidate to V req .The queue push process is shown in Fig. 2(c).Note that during the search process, this Queue keeps updating to hold L closest vectors to V req and is shared by all trees in the forest.After that, a back trace search starts by iterating all the nodes in the path from the parent of the current node to the root node as an example shown in Fig. 2(b).When reaching a Node i during the back trace, a same queue push is executed to judge whether to add Node i to Queue as illustrated in Fig. 2(c).For each Node i in this path, a distance comparison between Dis(V req , H i ) and Dis(V req , V qL ) is compared, where Dis(V req , H i ) is the distance between V req and a Node i 's hyperplane.H i can be considered as the projection vector of V req on Node i 's hyperplane.V qL is the Lth vector in Queue which meets , the back trace continues to the next node in this path.Otherwise, the sibling branch of Node i needs to be searched using the top-down traversal.In RKDF, once a node has been searched in one kd-tree, it will be marked and does not need to be checked again in the other trees.To further enhance the search efficiency of a RKDF, approximated search strategy can be adopted.In particular, based on the hypothesis that feature vectors of similar images are likely to be grouped in the same branch, there is a high probability that the targeted optimal top similar vectors will be visited well before visiting all nodes in each kd-tree.In Section VI, we will evaluate the relationship among the approximation strength, accuracy, and efficiency.The detailed search of a RKDF is provided in Algorithm 1.For more details about the RKDF, please refer to ref [14], [15].
To protect the privacy of user's data during the cloud-based annotation, the image data associated with the RKDF need to be encrypted.Furthermore, these encrypted data shall support corresponding search operations in RKDF, which include: and Dis(V req , V qL ) during the back trace process.
• The comparison between Dis(V req , V a ) and Dis(V req , V b ), i.e., distances from the request vector to two different images' feature vectors, which is used in the queue push process.
The distance Dis(•) between two vectors is calculated with a combination of L 1 distance and KL-Divergence [17].Specif-  ically, the distance Dis ab of two vectors is computed as where each vector has seven low-level color and texture feature vectors as discussed in Section III-A, and DL1 and DKL denote L 1 distance and KL-Divergence of two vectors after data normalization.
In order to address the privacy challenges while utilizing RKDF for cloud-assisted automatic image annotation, we first designed a privacy-preserving L 1 distance comparison scheme (namely, PL1C) and a privacy-preserving KL-Divergence scheme (namely, PKLC).PL1C and PKLC will enable the privacy-preserving distance comparison in the back trace process and queue push process of RKDF.In addition, we integrate order-preserving encryption [19], [20] into CPAR to protect the comparison of split field values in the top-down traversal of RKDF.

B. PL1C: Privacy-preserving L 1 Distance Comparison
In PL1C, we consider two types of L 1 distance comparison that are required in the queue push and back trace process of RKDF: 1) DL1 ac and DL1 bc for three image feature vectors V i , i ∈ {a, b, c}; 2) DL1 hc and DL1 bc for a hyperplane projected vector H a and two image feature vectors , where s a is the split field of the Node a .To be more specific, DL1 hc is calculated by projecting V c on Node a 's hyperplane and then calculating the L 1 distance between V c and the projected vector H a .
Data Preparation: Given an image feature vector such that the first v ij terms are 1 and the rest β − v ij terms are 0. The L 1 distance between V a and V b now can be calculated as Then, the approximation introduced in ref [22] is applied to Ṽi to update its dimension from mβ to m = αm log β+1 γ based on the Johnson Lindenstrauss (JL) Lemma [23].By denoting the approximated vector as Vi , we have 2 .The correctness and accuracy of such an approximation have been proved in ref [22].According to our experimental evaluation in Section VI, we sets α = 1 and γ = 100 in CPAR to balance accuracy and efficiency.
The detailed construction of the rest stages in PL1C is presented in Fig. 3.The user first encrypts the image feature vectors and its corresponding hyperplane projected vector (if exist), and then stores them in the cloud.Later on the user can generate encrypted L1 distance comparison request and ask the cloud to conduct privacy-preserving comparison.On receiving the request, the cloud can conduct two types of L1 distance comparison using ciphertext only according to user's request.
It is worth to note that PL1C is only interested in which distance is smaller during the comparison.Therefore, instead of letting the cloud get exact

C. PKLC: Privacy-preserving KL-Divergence Comparison
In PKLC, we also consider two types of KL-Divergence comparison similar to P L1C, i.e., 1) DKL ac and DKL bc for

Construction of PL1C
Data Encryption: 1) Append 3 elements to an approximated Vi as Vi = , where r is a random number and i is a small random noise.
2) If Vi is stored in a non-leaf node, generate a (2 m + 2)-dimensional hyperplane projected vector as Ĥi = [0, is i is the ( m + 1)th element, −1 is the ( m + 2 + s i )th element, and s i is the split field of node i.
3) Encrypt Vi and Ĥi using the Encryption algorithm of IVE as , and w are outsourced to the cloud.Request Generation: 1) Append approximated request vector  three image feature vectors V i , i ∈ {a, b, c}; 2) DKL hc and DKL bc for a hyperplane projected vector H a and two image feature vectors V b , V c .Given two m-dimensional vectors V i , i ∈ {a, b}, their KL-Divergence DKL ab is calculated as where log( In addition, the KL-Divergence DKL hc between a image feature vector and a hyperplane is measured by the KL-Divergence between H a [s a ] and V c [s a ], where s a is the split field of Node a .Similar with P L1C, P KLC is also calculated by projecting V c on Node a 's hyperplane and then calculating the KL-Divergence between V c and the projected vector H a . The detailed construction of PKLC is presented in Fig. 4. In the data encryption stage, the image feature vectors and corresponding hyperplane projected vector(if exist) are en-crypted and stored in the cloud.On receiving the encrypted KL-Divergence comparison request from the user, the cloud conducts two types of privacy-preserving KL-Divergence comparison using ciphertext only according to user's request.Similar to our PL1C construction, we have r c > 0 and r c >> ( b − a ).Therefore, the cloud can figure out which KL-Divergence is smaller based on the scaled and obfuscated comparison result.

Construction of PKLC Data Encryption:
1) Given an image feature vector V i , append m + 2 elements as where r is a random number and i is a small random noise.If V i is stored in a non-leaf node in RKDF, its corresponding hyperplane projected vector is processed as where s i is the split field of the node, ) and r are the s i th, (m + s i )th and (2m + 1)th elements respectively.2) Encrypt V i and H i with the Encryption algorithm of IVE as 1) Given request image feature vector Vc, replace its elements v cj with −rc × log(v cj ) and append m + 2 elements to it as , rc is a positive random number changing for every request.

D. Detailed Construction of CPAR
CPAR consists of five major procedures.In the System Setup, the user selects system parameters, extracts, preprocesses feature vectors of images in a pre-annotated dataset and uses these feature vectors to build a RKDF.Then, the user executes the RKDF Encryption procedure to encrypt all data associated with nodes in the RKDF.Both the System Setup procedure and the RKDF Encryption procedure are one-time cost in CPAR.Later on, the user can use the Secure Annotation Request procedure to generate an encrypted annotation request.On receiving the request, the cloud server performs the Privacy-preserving Annotation on Cloud procedure to return encrypted keywords for the requested image.At the end, the user obtains final keywords by executing the Final Keyword Selection procedure.
1) System Setup: To perform the one-time setup of CPAR system, the user first prepares a pre-annotated image dataset with n images, which can be obtained from public sources, such as IAPR TC-12 [16], LabelMe [24], etc.For each image I i in the dataset, the user extracts seven feature vectors Compared with other five feature vectors that have dimension up to 256, V i,H and V i,HQ have a high dimension as 4096.To guarantee the efficiency while processing feature vectors, Principal Component Analysis (PCA) [25] is utilized to reduce the dimension of V i,H and V i,HQ .According to our experimental evaluation in Section VI-C, PCA based dimension reduction with proper setting can significantly improve the efficiency of CPAR with slight accuracy loss.After that, L 1 normalization will be performed for each feature vector, which normalizes elements in these vectors to [-1,1].Besides V i,LAB , the user also increases each element in V i,k , k ∈ {RGB, HSV, G, GQ, H, HQ} as v i,k,j = v i,k,j + 1 to avoid negative values.Six feature vectors that use L 1 distance for similarity measurement are concatenated as a m L1 -dimensional vector V i,L1 .V i,LAB is denoted as a m KLdimensional vector V i,KL for expression simplicity.It is easy to verify that DL1 L1 ab = DL1 RGB ab After that, a RKDF is constructed with feature vector space {V i } 1≤i≤n , in which each node in a single tree is associated with one V i .For each non-leaf node in RKDF, its split field element V i [s i ] is stored in a set SF.In CPAR, the RKDF contains ten parallel kd-trees.
2) RKDF Encryption: Given an image I i in the preannotated dataset, its keywords {K i,t } are first encrypted using AES.Then, its processed feature vectors V i,L1 , V i,KL are encrypted with our PL1C and PKLC schemes as C(V i,L1 ) and C(V i,KL ) respectively.C(V i,L1 ) and C(V i,KL ) are then stored in the corresponding Node i of the RKDF.For each non-leaf node, encrypted hyperplane projected vectors C(H i,L1 ), C(H i,KL ) are generated and added into Node i using the data encryption processes described in our PL1C and PKLC.In addition, for the split field element V i [s i ] of each non-leaf node, an order-preserving encryption is executed and the ciphertext OP E(V i [s i ]) is stored in Node i .After the encryption, each node in the RKDF only contains encrypted data as During the encryption process, same secret keys S L1 , S L1 , S KL , public parameter w, and random number r will be used for all images.However, different error vector e i , e i and noise term i , i are generated for each image I i correspondingly.The user also computes S T L1 S s,L1 , S T L1 S s,L1 and S T KL S s,KL , in which S s,L1 , S s,L1 and S s,KL are secret keys for the encryption of later annotation requests.The encrypted RKDF, S T L1 S s,L1 , S T L1 S s,L1 and S T KL S s,KL are outsourced to the cloud.
3) Secure Annotation Request: When the user has a new image I s for annotation, he/she first extracts seven feature vectors as V s , s ∈ [RGB, HSV, LAB, G, GQ, H, HQ].These vectors will be processed to output V s,L1 and V s,KL as that in the System Setup procedure.V s,L1 and V s,KL are encrypted as C(V s,L1 ), C(H s,L1 ), and C(V s,KL ) using the Request Generation of PL1C and PKLC schemes respectively.For each annotation request, the user generates a new positive random number r s and new error vectors e s , e s .Meanwhile, for each element sf j in the split field set SF generated in System Setup, the user encrypts V s [sf j ] using order-preserving encryption as OP E(V s [sf j ]).C(V s,L1 ), C(H s,L1 ), C(V s,KL ) and {OP E(V s [sf j ])} are sent to the cloud as the annotation request.
4) Privacy-preserving Annotation on Cloud: On receiving the encrypted request, the cloud first performs a privacypreserving search over the encrypted RKDF.As described in Algorithm 1, the cloud conducts parallel search over each encrypted tree in the RKDF.There are three places that require the cloud to conduct privacy-preserving computation over encrypted data: • During the top-down traversal, as the split field element of each non-leaf node is encrypted using order-preserving encryption, the cloud can directly compare their ciphertexts (line 7) to determine which node to be checked next.
• In the back trace process, the cloud needs to perform privacy-preserving comparison to determine whether the current node's sibling branch needs to be searched (line 24 to 29).In particular, given C(V s,L1 ), C(H s,L1 ), C(V qL,L1 ), C(H parent,L1 ), C(V s,KL ), C(V qL,KL ), and C(H parent,KL ), the cloud first uses type-2 distance comparison in P L1C and P KLC to compute Then, the distance comparison is executed as where V qL is the least closest vector to V req in priority queue Queue.As r s is a positive value and r s >> ( parent − qL ), the sign of Comp qL − Comp h is consistent with Dis(V qL , V s ) − Dis(H parent , V s ).
• In the Queue push process (line 36-38), privacypreserving distance comparison is needed to determine whether a new node shall be added.Specifically, given C(V s,L1 ), C(V N ode,L1 ), C(V qL , L1), C(V s,KL ), C(V N ode,KL ), C(V qL , KL), the cloud use type-1 distance comparison in P L1C and P KLC to perform distance comparison as To this end, the cloud is able to perform all operations required by a RKDF search in the privacy-preserving manner, and obtain a Queue of nodes that store data of top related images to the request.The cloud returns distance comparison candidates (type-1 distance) Comp i , i ∈ Queue as well as corresponding encrypted keywords back to the user.

5) Final Keyword Selection:
The user first decrypts encrypted keywords and obtains K i,t , i ∈ Queue, where K i,t is the t-th pre-annotated keyword in image I i .Then, the user computes distances Dis(V i , V s ), i ∈ Queue as To achieve higher accuracy in keywords selection, we consider that keywords in images that have smaller distance to the requested one are more relevant.Thus, we define a real-time weight W t for each keyword based on distances Dis(V i , V s ) as Specifically, we first figure out the weight W Ii of each image according to their distance-based similarity.As our definition in Eq.7, images with smaller distance will receive a larger weight value.Then, considering the same keyword can appear in multiple images, the final weight W t of a keyword K i,t is generated by adding weights of images that contain this keyword.Finally, the user selects keywords for his/her image according to their ranking of weight W t .

V. SECURITY ANALYSIS
In CPAR, we have the following privacy related data: feature vectors {V i,L1 , V i,KL } 1≤i≤n , hyperplane projected vectors H i,L1 , H i,KL of each non-leaf node associated with V i,L1 , V i,KL , the split field element of each non-leaf node, keywords of image I i in the pre-annotated dataset, and feature vectors V s,L1 , H s,L1 , V s,KL of the image requested for annotation.As keywords are encrypted using standard AES encryption, we consider them secure against the cloud server as well as outside adversaries.For the split field element of each nonleaf node, it is encrypted using the order-preserving encryption [19], [20], which has been proved to be secure.With regards to Add N ode into Queue by order; encrypted using the encryption scheme of IVE [18] after preprocessing as presented in our PL1C and PKLC schemes.The IVE scheme [18] has been proved to be secure based on the well-known Learning with Errors (LWE) hard problem [26].Thus, given the ciphertexts ) only, it is computational infeasible for the cloud server or outside adversaries to recover the corresponding feature vectors.A. Security of Outsourcing S T L1 S s,L1 , S T L1 S s,L1 and S T KL S s,KL As S T L1 S s,L1 , S T L1 S s,L1 , and S T KL S s,KL are used in the same manner, we use S T S s to denote them for expression simplicity.Different from the original Encryption algorithm of IVE, the user in CPAR also outsources S T S s to the cloud besides ciphertexts.As all elements in S and S s are randomly selected, elements in their multiplication S T S s have the same distribution as these elements in S and S s [27].Thus, given S T S s , the cloud server is not able to extract S or S s directly and use them to decrypt ciphertexts.By combining S T S s with ciphertexts C(V i,L1 ) and C(V s,L1 ) (same as that for From the above two equations, it is clear that the combination of S T S s , C(V i,L1 ) and S T S s , C(V s,L1 ) only transfer them to the ciphertexts of V i,L1 and V s,L1 that encrypted using the IVE scheme with new keys S T S s S −1 and S T respectively.As S T S s S −1 and S T are random keys and unknown to the cloud, recovering V i,L1 , V s,L1 from S T S s C(V i,L1 ), S T S s C(V s,L1 ) still become the LW E problem as proved in ref [18].To this end, S T S s only helps the cloud to perform distance comparison in CPAR, but does not bring additional advantages to recover feature vectors compared with the given ciphertexts only scenario.

B. Request Unlinkability
The request unlinkability in CPAR is guaranteed by the randomization for each request.Specifically, each query request {V s,L1 , H s,L1 , V s,KL } is element-wise obfuscated with different random error terms e s , e s and random number r s during the encryption, which makes the obfuscated V s,L1 , H s,L1 , V s,KL have the same distribution as in these random values e s , e s and r c [27].Thus, by changing e s , e s and r c during the encryption of different requests, CPAR outputs different random ciphertexts, even for requests generated from the same image.

VI. EVALUATION
To evaluate the performance of CPAR, we implemented a prototype using Python 2.7.In our implementation, Numpy [28] is used to support efficient multi-dimension array operations.OpenCV [29] is used to extract the color-space features of the images and build the filter kernels to generate the Gabor filter results.Pywt [30] is adopted to perform Haar wavelet and get the corresponding Haar results.Sklearn [31] is used to perform the PCA transformation.FLANN library [14] is used to act as the non-privacy randomized kd-forest for comparison.We use the well-known IAPR TC-12 [16] as the pre-annotated dataset, which contains 20,000 annotated images and the average number of keywords for each image is 5.7.All tests are performed on a 3.1 GHz Intel Core i7 Macbook Pro with OS X 10.13.3 installed.
In the rest of this section, n is the total number of images in the pre-annotated dataset, m L1 is the dimension of preprocessed feature vectors V i,L1 , m KL is the dimension of pre-processed feature vectors V i,KL and their corresponding hyperplane projected vectors H i,KL , m L1 is the dimensions of hyperplane projected vector H i,L1 .We also use DOT m to denote a dot product operation between to two m-dimensional vectors.AP − X is used to denote the approximation power during the RKDF search, which indicates X% of the nodes will be checked in each tree of RKDF.P CA − X is used to denote the strength of P CA transformation applied to V i,H and V i,HQ in V i,L1 , which compresses their dimensions from 4096 to 4096 X .P CA−128, P CA−64, P CA−32, P CA−16, and P CA − 8 are evaluated in our experiments to balance the efficiency and accuracy of CAPIA.
In our evaluation, we first provides numerical analysis as well as experimental evaluation for each stage of CPAR.Then, we compare CPAR with CAPIA proposed in ref [1] in terms of efficiency and accuracy.

A. System Parameter Selection
To perform the one-time setup in CPAR, the user preprocesses feature vectors of each image in the pre-annotated image dataset.Specifically, the user first performs JL-Lemma based approximation over V i,L1 to make them compatible with our PL1C.As discussed in Section IV-B, there is a tradeoff between the approximation accuracy of L 1 distance and length of the approximated vector that determines efficiency of follow up privacy-preserving operations.To balance such a trade-off, we evaluate different parameters for approximation as shown in Fig. 5 (a)-(d).According to our results, we suggest to set α = 1 and γ = 100 which introduces 3.61% error rate for L 1 distance computation, and extends the dimension of V i,L1 from 864 to 1296 under the setting of P CA − 32.Specifically, the error rate drops fast when α < 1 and becomes relative stable when α > 1.Meanwhile, the dimension of the approximated vector increases linearly to the value of α.With regards to γ, the dimension of the approximated vector becomes relative stable when γ > 100, however, the error rate still increases when γ > 100.With regards to the selection of P CA parameter, it is clear that better efficiency of CPAR will be achieved by increasing the strength of P CA.However, the stronger P CA setting will also cause accuracy loss due to the loss of information during the compression.To balance the efficiency and accuracy, we evaluate of accuracy loss of annotation with different PCA setting.Compared with the N o − P CA setting, Fig. 6 shows the accuracy loss for P CA − 8, P CA − 16, and P CA − 32 are stable and bounded in 0.5%.Differently, P CA − 64 and P CA−128 rapidly raise the accuracy loss.Therefore, P CA− 32 is adopted by CPAR.

B. RKDF Construction and Encryption
To construct an encrypted RKDF, the user first constructs an unencrypted RKDF using 20,000 pre-annotated images, and then replaces data of each node in the RKDF with their corresponding ciphertexts.The construction of an unencrypted RKDF with 10 kd-trees costs 28.56 seconds.Then, for the pre-processed feature vectors V i,L1 and V i,KL of each image, the user can encrypt them using PL1C and PKLC with (m L1 )DOT m L1 and (m KL )DOT m KL operations respectively, which costs 8.4ms in total in our implementation.If an image is associated with a non-leaf node in any tree of the RKDF, encryption for the hyperplane projected vectors H i,L1 and H i,KL with (m L1 )DOT m L1 and (m KL )DOT m KL operations respectively, which costs 54.7ms in total.In addition, for each non-leaf node, an order-preserving encryption is needed for the split field, each of which costs 1.4ms.Therefore, to build a 10-tree encrypted RKDF with a 20,000 pre-annotated image dataset, it takes 74.78 minutes in our implementation.It is noteworthy that the encrypted RKDF construction is one-time offline cost, which does not impact the performance of later on real-time privacy-preserving image annotation.

C. Real-time Image Annotation
Request Generation: To annotate a new image in a privacypreserving manner, the user pre-processes and encrypts its fea-ture vectors V s,L1 and V s,KL using PL1C and PKLC.Specifically, the encryption of V s,L1 requires (m L1 )DOT m L1 + (m L1 )DOT m L1 for shown in Fig. 3, and the encryption of V s,KL requires (m KL )DOT m KL operations as shown in Fig. 4. In addition, each element sf j in the split field element set SF with size of 348 in our implementation, orderpreserving encryption are executed for V s [sf j ].As a result, the encrypted request can be efficiently generated with only 534.16ms.+ (m KL + 1)DOT m KL operations respectively.With regards to the queue push process, privacypreserving type-1 distance comparison are executed using PL1C and PKLC, which requires 2(m L1 + 1)DOT m L1 + 2(m KL + 1)DOT m KL operations in total.Another important parameter that affect the search efficiency is the selection of approximation power AP − X .As depicted in Fig. 7, by increasing the approximation power from AP − 100 to AP−2.5, the privacy-preserving annotation using encrypted RKDF reduces from 190.77 seconds to 3.8 seconds.Compared with CAPIA [1] that requires 250 seconds for one privacypreserving annotation on cloud, CPAR can significantly speed it up as depicted in Fig. 8.
Final Keyword Selection: This process only involves AES decryption and the weights generation that only requires a small number of additions.As a result, the final keyword selection can be completed by the user within 318ms.In our evaluation, annotation requests for 50 different images are submitted, in which each requested image has two or more related images in the pre-annotated dataset.As shown in Fig. 9, the accuracy of CPAR reduces from 88.42% to 67.59% when the approximation power increases from AP − 100 to AP−2.5.Compared with CAPIA [1] our scheme achieves the same accuracy by setting the approximation power as AP − 100.While the increasing of approximation power reduces the accuracy of CPAR to some extent, it also boost the efficiency significantly as shown in Fig. 7. Compared with CAPIA, Fig. 10 shows that CPAR can speed up CAPIA by 2×, 5.5×, 12×, 16.5×, 23.88× when achieving 97.7%, 91.4%, 88.9%, 84.7%, 80.3% accuracy of CAPIA respectively.Therefore, CPAR can greatly promote the efficiency the of CAPIA while retaining comparable accuracy.To balance the efficiency speedup and annotation accuracy of CPAR, we suggest to set the approximation power as AP − 10, i.e. achieves 88.9% accuracy of CAPIA with 12× speedup.
In Table I, we present samples of automatically annotated images using CPAR with approximation power as AP−10.On one hand, CPAR is highly possible to assign correct keywords to images compared with human annotation.This observation also confirms the high average recall rate of CPAR, since these ground-truth annotations are likely to be covered in CPAR.On the other hand, CPAR also introduces additional keywords that frequently appear together with these accurate keywords in top related images.These additional keywords are typically not directly included in human annotations, but are potentially related to correct keywords.Overall, our evaluation results demonstrate that although CPAR cannot provide perfect keywords selection all the time compared with human annotation, it is still promising for automatically assigning keywords to images.Communication Cost: The communication cost in CPAR comes from two major parts: annotation request and encrypted results returned from the cloud server.The encrypted request consists of a m L1 -dimensional vector C(V s,L1 ), a m L1dimensional vector C(H s,L1 ), a m KL -dimensional vector C(V s,KL ) and a set of encrypted split field elements SF.In the P CA − 32 setting, the total communication cost for a request is 80KB, in which 26KB for C(V s ), 48KB for C(H s ) and 4KB for SF.Meanwhile, the returned results contain encrypted keywords and distance comparison candidates of top 10 related images.Using AES-256 for keywords encryption, the total size for the returned result is 488 Bytes with the average number of keywords for each pre-annotated image as 5.7.Therefore, the communication cost for each privacypreserving annotation can be efficiently handled in today's Internet environment.

VII. RELATED WORKS
To solve the problem of how to search over encrypted data, the idea of keywords-based searchable encryption (SE) was first introduced by Song et.al in ref [6].Later on, with the widespread use of cloud storage services, the idea of SE received increasing attention from researchers.In ref [7], [8], search efficiency enhanced SE schemes are proposed based on novel index constructions.After that, SE schemes with the support of multiple keywords and conjunctive keywords are investigated in ref [9], [10], and thus making the search more accurate and flexible.Recently, fuzzy keyword is considered in ref [11], which enables SE schemes to tolerate misspelled keyword during the search process.While these SE schemes offer decent features for keywords-based search, their application to images are limited given the question that how keywords of images can be efficiently extracted with privacy protection.It is impractical for cloud storage users to manually annotate their images.
To automate the keywords extraction process for images, a number of research works have been proposed with the concept of "automatic image annotation" [17], [32]- [34].Chapelle et al. [35] trained support vector machine (SVM) classifiers to achieve high annotation accuracy where the only available image features are high dimensional histograms.In ref [36], [37], SVM was used to learn regional information as well as helped segmentation and classification process simultaneously.Different from SVM which works by finding a hyperplane to separate vector spaces, Bayesian network accomplishes the annotation tasks by modeling the conditional probabilities from training samples.In ref [38], [39], Bayesian networks were built by clustering global image features to calculate the conditional probabilities.Another widely used technique is artificial neural network (ANN).Take ref [40] as an instance, based on the assumption that after image segmentation, the largest part of an image significantly characterizes the entire image, Park et al. annotated images using a 3-layer ANN.With the flourish of deeper ANN structures, such as convolutional neural network (CNN), in various vision tasks [41]- [43], these deeper frameworks have also been applied to image annotation tasks.In ref [44], Yunchao et al. proposed to solve image annotation problem by training CNN with rankings.Jian et al. [45] combined CNN with recurrent neural network (RNN) to address the problem of the keyword dependency during annotation.However, all of these image annotation works raise privacy issues when delegated to the cloud since unencrypted images need to be outsource.Therefore, to address such privacy concerns, this paper proposes CPAR, which utilizes the power of cloud computing to perform automatic image annotation for users, while only providing encrypted image information to the cloud.

VIII. CONCLUSION
In this paper, we propose CPAR that enables privacypreserving image annotation using public cloud servers.CPAR uniquely integrates randomized kd-forest with a privacypreserving design, and thus boosting the annotation efficiency using cloud.Specifically, CPAR proposes the lightweight privacy-preserving L 1 distance (PL1C) and KL-Divergence (PKLC) comparison schemes, and then utilizes them together with order-preserving encryption to support all required operations in image annotation and randomized kd-forest search.Our PL1C, PKLC and privacy-preserving randomized kdforest can also be utilized as independent tools for other related fields, especially for efficient similarity measurement on encrypted data.Thorough security analysis is provided to show that CPAR is secure in the defined threat model.Extensive numerical analysis as well as prototype implementation over the well-known IAPR TC-12 dataset demonstrate the practical performance of CPAR in terms of efficiency and accuracy.

8 Fig. 2 .
Fig.2.Vreq is the request vector and each V i is stored in each tree node i. Dis(•) is an arbitrary distance calculation function and Dis(Vreq, H i ) is the distance between the request vector Vreq and Node i 's hyperplane.V qL is the least closest vector to Vreq in priority queue Queue.(a) represents top-down traversal; (b) represents back trace search and (c) represents queue push process.
L 1 distances for comparison, PL1C adopts approximated distance comparison result scaled and obfuscated by r c , b − a and b − a as shown in IV-B.As r c is a positive random number, the sign of rc 2 (DL1 ac − DL1 bc ) and rc 2 (DL1 hc −DL1 bc ) are consistent with DL1 ac − DL1 bc and DL1 hc − DL1 bc respectively.Meanwhile, since r c >> b − a and r c >> b − a , the added noise term has negligible influence to the sign of DL1 ac − DL1 bc or DL1 hc − DL1 bc unless these two distances are very close to each other.Fortunately, instead of finding the most related one, our CPAR design will utilize PL1C to figure out top 10 related candidates during the comparison.Such a design makes important candidates (say top 5 out of top 10) not be bypassed by the error introduced in b − a and b − a .This hypothesis is further validated by our experimental results in Section VI.

2 )
Using the Encryption algorithm of IVE to encrypt Vc as C(Vc) = S −1 c (wVc + ec) T .C(Vc) and S T Sc are sent to the cloud as request.KL-Divergence Comparison: Type-1: Compare DKLac, DKL bc 1) Compute vec(C(Va)C(Vc) T ) w q , vec(C(V b )C(Vc) T ) w q and decrypts them as VaV T c and V b V T c using the Decryption of IVE in Section III-B.2) Compare KL divergence as VaV T c −V b V T c = rc(DKLac − DKL bc ) + ( b − a).Type-2: Compare DKL hc , DKL bc 1) Compute vec(C(Ha)C(Vc) T ) w q , vec(C(V b )C(Vc) T ) w q and decrypts as HaV T c and V b V T c using the Decryption of IVE as Eq.2. 2) Compare KL divergence as HaV T c −V b V T c = rc(DKL hc − DKL bc ) + ( b − a ).

Fig. 9 .Fig. 10 .
Fig. 9. Accuracy (Recall) of CPAR with Different Approximation Power KL , they are Algorithm 1: Privacy-preserving RKDF Search Input : Encrypted Search Request (Req) for Vs, Encrypted RKDF with a set of Trees {T k }, approximation power AP − X Output: Encrypted Nodes Associated with Top Related Images to the Request. 1 Initialization Queue = [], P ath = [] (Searched Path), V is = [] (Visited Nodes), N ode k =T k .root; 2 Each tree T k executes topDownTraversal() and backTraceSearch() in parallel, Queue and V is are shared among all trees; 3 Function topDownTraversal(Req, N ode k ): 15 return N ode k ; 16 Function backTraceSearch(Req, N ode k ): 19 if P ath is not null then 20 parent ← − P ath.pop();21 if parent ∈ V is then 22 V is.push(parent); 30 Function Queue.push(Node): 31 //Each N odeq in Queue are ordered by Dis(Vs, V N odeq ) 32 if Queue.length() < Defined Size L then 33 Add N ode into Queue by order; 34 else 35 if N odeqL in Queue has Dis(Vs, V N ode ) < Dis(Vs, VqL) then 36 Remove N odeqL from Queue;