Exploring the Privacy Bound for Di ﬀ erential Privacy: From Theory to Practice

Data privacy has attracted signiﬁcant interests in both database theory and security communities in the past few decades. Di ﬀ erential privacy has emerged as a new paradigm for rigorous privacy protection regardless of adversaries prior knowledge. However, the meaning of privacy bound (cid:15) and how to select an appropriate (cid:15) may still be unclear to the general data owners. More recently, some approaches have been proposed to derive the upper bounds of (cid:15) for speciﬁed privacy risks. Unfortunately, these upper bounds su ﬀ er from some deﬁciencies (e.g., the bound relies on the data size, or might be too large), which greatly limits their applicability. To remedy this problem, we propose a novel approach that converts the privacy bound in di ﬀ erential privacy (cid:15) to privacy risks understandable to generic users, and present an in-depth theoretical analysis for it. Finally, we have conducted experiments to demonstrate the e ﬀ ectiveness of our model.


Introduction
Large volumes of sensitive personal data (e.g., medical records, transaction history) are being ubiquitously collected by assorted organizations (e.g., hospitals, retailers).To further unlock the utility of a wide variety of datasets, these organizations often need to provide public access to such datasets or share the data with a third party (e.g., researchers or data analysts).This may pose great threats to the privacy of individuals.
To this end, the data privacy research has attracted significant interests in both database and security in the past few decades.As a major branch of the existing works, privacy preserving data publishing techniques [9,27] have been proposed to sanitize specific datasets such that the output data can be publishable by satisfying a pre-defined privacy notion.More specifically, such privacy preserving techniques can be categorized into two types: (1) anonymization [18,19,27], and (2) differential privacy [1,4].
Anonymization.In this case, a trusted curator computes and publishes an anonymized output data.Figure 1(a) shows such an example (table T 1 ).Sufficient degree of anonymization is achieved by hiding a record in a crowd with some records having the same values.In such non-interactive setting [16,18,19,27], data recipients passively receive the anonymized table, and then issues some queries over the anonymized data.Differential Privacy.In this case, a curator sits between the data recipients and the database.Data recipients issue statistical queries over the database (as the utility of the data).Data recipients can infer information from the database via their queries.Thus, in order to protect the privacy of the respondents, the statistical queries issued by the data recipients and/or the responses to these queries, can be modified by the curator.More specifically, without loss of generality, we adopt count queries as our example in the following, as illustrated in Example 1 (e.g., in Table T 1 ).Inferences from other types of queries such as min, max, mean or sum can be handled in a similar way.
Example 1.Consider an adversary who have some background knowledge of the individuals (e.g., Alex with age 20 and Zipcode 15k), and would like to infer the fact whether Alex is involved in Table T 1 (which is used for analysis) in Figure 1(a) or not.
• q 1 : Select Count( * ) From T 1 Group By Age, Zipcode HAVING Age =20 AND Zipcode = 15k The response of q 1 would be the count of individuals with age 20 and zipcode 15k.Then, the adversary can infer if Alex is involved in the analysis (query) based on the query result.An effective approach to address the above inferences (via the queries) is output perturbation [2,5], which injects small amount of random noise into each query result.Numerous output perturbation techniques are available in the literature of statistics.However, such techniques are not based on a rigorous definition of privacy.
Recently, differential privacy [1,4] has been the breakthrough in this field, and it provides strong privacy guarantee to prevent adversaries from inferring the status of any individual in the database (included or not) from their queries over the databases.Roughly speaking, differential privacy ensures that the removal or addition of a single tuple in the input data (e.g., T 1 in Figure 1(a)) does not substantially affect the outcome of any analysis.Therefore, the privacy risks are bounded regardless of adversaries' background knowledge -the query results are indistinguishable if queries are posed over any two neighboring databases (with or without any individual's data).

Motivation: (1) Interpreting the Privacy Bound for Differential Privacy to Practical Privacy Risks.
Although the notion of differential privacy has successfully achieved the objectives of privacy protection in many applications [10,12,21], the meaning of the privacy budget is still unclear to real users in practice.In general, the degree of privacy preservation gained by -differentially private algorithms is anti-monotonic on the privacy bound .That is, the smaller is, the better privacy protection can be achieved.
From practitioners' point of view, while was originally derived from the mathematical or probabilistic domain, it is difficult to quantitatively measure the strength of the privacy protection provided by differential privacy with any specific .This makes it hard for ordinary users to select appropriate values of for privacy protection, while maximizing the utility of the analytical results produced by different privacy mechanisms.Cynthia Dwork and Adam Smith put forward related questions in [6]: "What does it mean not to provide differential privacy?Failure to providedifferential privacy might result in 2 • -differential privacy.How bad is this?Can this suggest a useful weakening?How much residual uncertainty is enough?".The main idea underlying these questions is how to interpret the degree of protection provided by -differential privacy to practical privacy risks (e.g., the upper bound of the probability that any individual's data is included in the analysis).
Interpretive Inference Model.A practical approach to interpret the privacy bound would be propose an interpretive inference model on the -differentially private query results -the probability that any individual tuple (viz.individual's data) is inferred to be included in the input database (denoted as "inference probability" for simplicity).As a consequence, the inference probabilities can be explicitly understandable to generic data owners).Note that the interpretive inference model is proposed for interpreting privacy risks in to practical privacy risks of individual tuples.

Motivation: (2) How to Choose An Appropriate
Then, after realizing the degree of privacy guarantee with a given interpretive inference model, the next open question would be how to choose appropriate values for based on data owners' required privacy risks (e.g., probabilities that any individual's data is included).Referring to the interpretive inference model used for converting to probabilities (of inferences), this question can also be interpreted as follows.Given any maximum inference probability, what is the maximum (upper bound) tolerable in differential privacy to satisfy the practical privacy guarantee (represented as the maximum inference probabilities).
To this end, Jaewoo Lee et al. [15] proposed an interpretive inference model which can be used to derive the upper bounds of given any maximum inference probability.We illustrate such upper bounds in the following example.
Example 2. We utilize a real database CENSUS that is frequently studied in privacy research [17,28,32,34].It contains the demographic information of 600k American adults.Each tuple has eight attributes: Age, Gender, Education, Marital, Race, Work-class, Country, and Income.
For instance, we apply count queries on the database.Suppose that we want to enforce the adversary's Exploring the Privacy Bound for Differential Privacy: From Theory to Practice probability of successfully identifying the individuals from the counting queries to be no greater than ρ ≤ 1 10 (maximum inference probability).In order to achieve this protection goal with the existing theoretical result [15] (more details will be given in Section 3), we thus have: the upper bound of differential privacy budget should satisfy ≤ ∆f ∆v ln (n−1)ρ 1−ρ , where n represents the number of records, ∆f is the sensitivity of query, ∆v is the maximum distance between function values of every possible world (the same information needed to calculate global sensitivity) [15], and ρ is the maximum inference probability.Then, if given n = 600, 000 records, the bound yields where ∆v is no greater than 1 for count queries.
In the above example, the upper bound of would be 11.1, which might exceed our expectation.In other words, such large can satisfy ρ ≤ 1  10 in their interpretive inference model but can be vulnerable in other cases (for instance, in our proposed interpretive inference model, =11.1 would result in ρ > 1  10 ) (higher privacy risks than the data owners' demand).
Furthermore, in the interpretive inference model proposed in [15], is proportional to ln(n) where n is data size.As n increases, the upper bound of also increases.In case of a large or small n, the derived bound would be meaningless (unbounded or negative).From the above examples, we can see that existing solutions have their inherent drawbacks.Motivated by such observations, we propose a novel interpretive inference model, which can be used to evaluate the probability or confidence that the adversary will be able to identify any individual from the noiseinjected queries over the dataset.This enables us to understand the privacy implications of differentially private techniques in a much clearer way.

Our Contributions
The major contributions of this paper are summarized as follows.
• This paper presents a novel interpretive inference model to convert the privacy bound in differential privacy to inference probabilities that any individual is included in the input data (for queries).The proposed interpretive inference model and converted inference probabilities have addressed the drawbacks of the existing models [15,24].
• Based on the proposed interpretive inference model, we present an instantiation for choosing appropriate (maximum privacy bound in differential privacy), which should effectively bound the risks of inferring the presence/absence of individuals (given the maximum inference probability) in generic differentially private algorithms.
• An in-depth theoretical analysis of our approach is provided, and a set of experiments are conducted to confirm the effectiveness of our approach.
The rest of the paper is organized as follows.In Section 2, we describe the preliminaries for differential privacy.In Section 3, we present the analysis for two representative existing works.Then, in Section 4, we propose our interpretive inference model and the upper bound for in differential privacy (given the maximum inference probability).Section 5 demonstrates the experimental results, and Section 6 reviews related work.Finally, Section 7 gives the concluding remarks.

Preliminaries
In this section, we will first describe the basic mechanism of differential privacy, and then present the Laplace distribution which contributes to a generic differentially private approach.

Differential Privacy
The most commonly-used definition of differential privacy is -differential privacy, which guarantees that any individual tuple has negligible influence on the published statistical results, in a probabilistic sense.Specifically, a randomized algorithm A satisfiesdifferential privacy if and only if for any two databases D 1 , D 2 that differ in exactly one record, and any possible output O of A, the ratio between the probability that A outputs O on D 1 and the probability that A outputs O on D 2 is bounded by a constant.Formally, we have where is a constant specified by the user, D 1 , D 2 differ in at most one element, and e is the base of the natural logarithms.Intuitively, given the output O of A, it is hard for the adversary to infer whether the original data is D 1 or D 2 , if the parameter is sufficiently small.Similarly, -differential privacy also provides any individual with plausible deniability that her/his record was in the databases.
The earliest and most widely-adopted approach for enforcing -differential privacy is the Laplace mechanism [4], which works by injecting random noise x ∝ lap(λ) that follows a Laplace distribution into the output of the original O, and the deterministic algorithm A obtains its randomized version O + x, that is, Definition 1 (Sensitivity).The sensitivity f of the query function f is defined as the maximal L 1 -norm distance between the exact answers of the query q (i.e., q(D 1 ) and q(D 2 )) on any neighboring databases:

The Laplace Distribution
A random variable has a Laplace(µ, b) distribution if its probability density function is It is straightforward to integrate the Laplace distribution.Its cumulative distribution function is as follows:

Inferences on the Privacy Bound
In this section, we first introduce a key concept for describing the adversary's inference ability: Potential Input Set.Then, we provide an in-depth analysis of two existing work [15,24] on converting the guarantee of privacy budget to inference probabilities.

Potential Input Set
Definition 2 (Potential Input Set).Given any output S of a differentially private algorithm A, the potential input set Ψ is a set of corresponding possible inputs ∀D i .Then, Note that our interpretive inference model can be applied to different kinds of queries, e.g., count, sum, min and max.We use sum queries in the following example.
Example 3. In Example 2, if sum queries are given, there will be n = 60, 000 possibilities of the potential input set Ψ : any subset X ⊆ CENSUS with n − 1 tuples sampled from CENSUS can be a possible input D i .While in Example 1, there are only 2 possibilities for query q 1 : Ψ = {∅, {Alex}}.
The cardinality of the potential input set can be very large or extremely small, which depends on external knowledge.Basically, differential privacy hides the presence of any individual in the database from data users by making any two output distributions in Ψ (one is with individual and the other is without) computationally indistinguishable.The adversary's goal is to figure out whether D i ∈ Ψ is true or not.

Lee and Clifton's Interpretive Inference Model [15]
In order to bound , a few assumptions are made by Lee and Clifton [15], e.g., assuming that the adversary has a database D consisting of n tuples, has an infinite computational power, and knows everything about the universe besides which individual is missing in the database For simplicity, they assume that α is a uniform prior, i.e., ∀w ∈ Ψ , α(w) = 1 n .They refer to each possible combination w in Ψ as a possible world.The posterior belief β is defined as where A f is an -differentially private mechanism for query function f .Given the best guess w , the confidence of the adversary's guess is calculated as β(w ) − α(w ).Authors treated the adversary's posterior belief on each possible world as the risk of disclosure.Starting from Equation (5), after several steps of deduction, the upper bound of can be derived as follows: Although such study is the state-of-the-art, the upper bound provided in Equation ( 6) has two drawbacks that greatly limit its practical applicability.
• First, it is somewhat surprising that the upper bound is directly proportional to ln(n), and n is the size of the potential input set.As illustrated in Example 2, the size n is a crucial component in deciding .Therefore, the solution may be not suitable when the size is too small or large.Moreover, the upper bound would be infinite if the potential input set contains only a single tuple.Therefore, we can safely make a conclusion that the upper bound described above is not always applicable and has its inherent disadvantage.
• Second, this solution still cannot estimate the probability that a certain tuple is included or not.Specifically, when the differentially private algorithm A returns a query, its response means nothing for interpreting the privacy risks for individual tuples regardless of whether the answer is large or small.
Let us illustrate these with Example 4.

Example 4.
With the purpose of identifying the victim Alex, the adversary issues a query q 1 as shown in Example 1.As a running example, the -differential private algorithm A answers its query with noise -1.1, then returns A(q 1 ) = −0.1 to the adversary.No matter what the answer is, the adversary cannot make an assertion regarding whether Alex has contributed to the result or not.

Naldi et al.'s Interpretive Inference Model [24]
In [24], Naldi et al. provided a different approach for choosing .With respect to the above notation, the new equation for choosing can be given as follows: The parameter w > 0 indicates that the true value of q(D) is within interval [q(D)(1 − w), q(D)(1 + w)] with probability ρ.Equation ( 7) is derived from below: where X is the guess value.
From Examples 4 and 5, we can see that although choosing proper for differential privacy has attracted many attentions, and has successfully achieved some upper bounds, state-of-the-art upper bounds derived via Equations ( 6) or (7) do not function properly in many cases.Based on such observation, we claim that the problem of choosing proper values of remains open.

Novel Interpretive Inference Model on the Privacy Bound
In this section, we present our interpretive inference model and how to choose .

Inferring Query Results
First, for each noise variable x added by algorithm A, which follows Laplace distribution f (µ, b), it is impossible for the adversary to guess it accurately.However, the adversary can guess x if it falls in a short interval.That is, for each noise variable x, the adversary generates x , which is also governed by the Laplace distribution f (µ, b).If |x − x | < L, we consider that the inference is successful.The length L is related to query types.For example, in the case of count queries, L equals 0.5.
Consider that the guess x is also generated according to Laplace distribution.Then, the confidence of the adversary's guess can be calculated using Definition 3. Definition 3. Given a guess x , the adversary's confidence in guessing the exact x that falls into interval L is defined as The definition of F is presented in Equation ( 4).
We note that probability P rob(|x − x | < L) is not a fixed value, but varies with x, which is sampled from a Laplace distribution.The nearer that x appears in location u, the larger the probability P rob(|x − x | < L) is.On the contrary, while x is picked far away from the the location u, probability P rob(|x − x | < L) becomes much smaller.
Therefore, it makes no sense to calculate the probability for each noise variable x, since such noise can be sampled as much as you want, which is inexhaustible.Moreover, the adversary cannot infer the exact x after deriving response q(D) + x.Luckily, we can compute its mathematical expectation that reflects the average level.Theorem 1.Given response q(D) + x, interval L, the mathematical expectation of the probability that the adversary can guess the exact value falling into interval [q(D) − L, q(D) + L] can be given as follows: Proof.We can use p(x) to denote F(x + L) − F(x − L) for simplicity.In the light of the definition of mathematical expectation in the area of probability theory, we have the following equation: where the definition of f (x) is given in Equation (3).In order to calculate the final result, the integration Equation ( 10) is divided into 4 parts: Then, after Equation ( 9) is substituted into Equation ( 11), thus we have: Now, we can integrate the four parts in Equation ( 12), respectively.
Recall that, to implement -differential privacy, the noise λ is governed by Lap( ).Therefore, by replacing b with , the theorem has been proven.Example 6.Let = 1, and the adversary submits count queries, then the probability of successful inference that the adversary can achieve is 24.18%.For example, if the adversary submits a workload of 10,000 queries, then there are about 2,418 queries, of which the value of q(D) could be correctly inferred (Note that = 1, L = 0.5).

Inferring Record Status: Present or Absent
Recall that given a query answer q(D) + x, we can derive its exact answer q(D) with a certain probability.However, it is not our ultimate goal, as the goal of the differential privacy is to determine whether a certain tuple has contributed to the query result or not.Hence, we need to go further into reduction of Theorem 1, which aligns with the objectives of differential privacy.
Example 7. Let's continue with Example 1, and assume that = 1.Suppose the adversary tries to identify victim Alex, and issues a query q 1 as shown in Example 1.The random noise drawn from the Laplace distribution with mean 0 and scale factor b = 1 is 0.7, then the response is produced as A(q 1 ) = 1 + 0.7 = 1.7 by the -differentially private algorithm A.
As a result, the adversary generates an additional noise x .If 0.7 − x > 0.5, that is, x < 0.2, the inference is successful, and the adversary can infer that Alex's data is included.Otherwise, the inference fails.
This example inspires us that a certain tuple is absent or not can be determined by extrapolating the query result with sufficient background.Generally, let q be a count query such that q(D) = c, and the adversary knows that q(D) falls into the potential input set Ψ = {c 1 , c 2 , . . ., c n } where some integer constants the intervals between every two consecutive integer constants can be greater than 1).The main task of the adversary is to make an answer for c i ∈ Ψ or not, which is close to the value of y = x + q(D) − x .In other words, the adversary pick the answer that minimizes |c i − y|, formally arg min ∀i∈{1,...,n} |c i − y| (15) Observing this, we propose an inference algorithm for the count query as shown in Algorithm 1.The adversary issues a query q(D), which restricts the query to a specific victim (e.g., query q 1 in Example 1).The differentially private mechanism A generates noise x, which is governed by Laplace distribution f (µ, b).Then, noise x plus the real value q(D), that is, x + q(D), is returned to the adversary.The adversary generates some different noise x , and computes the value y = x + q(D) − x .Eventually, the adversary concludes that: if the answer of y equals c, then the inference is successful, otherwise will be failure.

Example 8.
As for the min (max or sum) queries, the adversary may know the fact that the victim Alex is involved and his minimal Then, the adversary issues query q 3 : Select Min(Income) From CENSUS This is the case with the potential input set |Ψ | = 1, which is different from the count query.
Indeed, it is sufficient for the adversary to determine whether the victim is present or not for the count query, while for the min query, the case would change.The adversary can only determine that the victim's income falls into a small interval [q(D) − L, q(D) + L], where L is half length of the interval.We extend the above approach to other types of queries, such as mean, sum, min and max.The main difference is the predicate condition (see Line 3 in Algorithm 2).
Algorithm 2 Inference for Min, Max or Sum Query Input: A(q(D)) = x + q(D) Output: Success or Failure /* Lap(µ, the scale factor b, and the location u */ 1: The adversary generates a noise x variable, which is governed by Laplace distribution f (µ, b)  return Failure 7: end if Theorem 3.For min (or extended to max, sum, etc.) queries, Algorithm 2 can correctly infer whether a certain victim is in interval [q(D) − L, q(D) Note that Theorem 3 can be directly derived from Theorem 1.

Choice of
From the discussions in Section 4.2, we can make the conclusion to a certain extent.Theorem 1 can be used to estimate the risk of disclosing presence/absence of any individual in the database (by the interpretive inference model), which is given in the following corollary.
Corollary 1.Let ρ be the probability of being identified as present in the database, then parameter on the adversary's probability can be utilized to enforce requirements constrained by the following formulas: • for min (or extended to max,sum) queries: ρ ≥ Thus, we illustrate how to choose to prevent the adversary from successfully identifying an individual (for the maximum inference probability ρ).Note that it is challenging to directly calculate the inverse functions of the above formulas directly.Consider that the formulas in Corollary 1 are functions of monotone decreasing with .Then, we can approximate it with a binary search strategy: we pre-compute some pairs of < ρ i , i >.Given a ρ, if we can find a ρ i = ρ, then we return i .Otherwise, we need to find k and k+1 where ρ k ≤ ρ ≤ ρ k+1 .Next, we compute the pair < ρ j , j > using formulas in Corollary 1 with k + k+1 2 .If ρ j > ρ, we compute ρ j with j + k+1 2 ; otherwise, we compute ρ j with j + k 2 .The above procedure terminates as ρ j approximates ρ within a marginal error where the time complexity is log(n).

Experimental Evaluation
In this section, we conduct experiments to evaluate our proposed approach.Note that the propopsed approach is experimentally incomparable with [15] and [24] since three different approaches have different sets of parameters (which may result in biased comparisons).
Specifically, with comparison of the experimental and theoretical values of , we will show that our model is fully consistent with the actual tests.In the experiments, our theoretical model is denoted by THE and the actual tests is shortened as ACT.The ACT is conducted by randomly generating n pairs of noise x 1 , x 1 , x 2 , x 2 , ..., , x n , x n .Denoting the number of noise

Results on Varying
This set of experiments are mainly designed to study the influence of on the probabilities of successful inferences in THE and ACT.The results with length=0.5 and length=0.2are shown in Figure 2(a) and 2(b).From the results, we can see that in all the experiments, the probabilities produced by our model are very close to the actual tests.To be specific, varying from 0.001 to 1, the differences between the two approaches (THE and ACT) are 0.00003, -0.00026, 0.00004, 0.000073 with length=0.5(see Figure 2(a)).When the length equals 0.2, the differences are -0.00004,-0.00011, -0.000151, 0.000146, as presented in Figure 2(b), respectively.These results further confirm the correctness of our interpretive inference model.

Results on Varying Length of Interval L
Then we consider comparing THE and ACT by varying the length of intervals from 0.1 to 1. Figure 3(a) and 3(b) show the two probabilities of successful inferences by THE and ACT with = 1 and = 0.5, respectively.In general, the probabilities will increase along with the increase of length L. This is because with larger L, it is easier for the adversary to obtain a successful inference.In these figures, we can see that the probabilities generated by both THE and ACT have minor differences, which is consistently achieved, especially when L lies between 0.1 to 1.

Results on Varying Workload
Finally, we demonstrate that how workload affects THE and ACT.More specifically, we consider the workload of testing varies from 100 to 1,000k.Figure 4(a) and Figure 4(b) plot THE and ACT as a function of workload, with = 1 and = 0.5, respectively.We can see that the test values become increasingly closer to theoretical values with the growth of workload.This observation can be attributed to the fact that more tests lead to more accurate results.Similarly, the theoretical values are consistently close to the actual tests for large workloads.

Summary
With the demonstrated experimental results (Figure 2-4), we can confirm that the main result of our model, i.e., Theorem 1, is correct.Our model generates only negligible errors compared with actual tests.Notice   that, the experimental results are derived from noisy queries (e.g., count and sum): comparing the actual noise generated for queries with the theoretical results (i.e., derived from Theorem 1).Thus, the experimental results are independent of the experimental datasets.In other words, the results would be consistent with the results derived from any dataset (using the same noise).

Related Work
Since Dwork [4] proposes the seminal mechanism of -differential privacy, there has been a large body of work on it.The literature of differential privacy can be classified into three main categories.The first category aims at studying the properties of differential privacy and its variants.For example, a natural relaxation of differential privacy ( , δ) [6] was proposed where better accuracy (a smaller magnitude of added noise) and generally more flexibility can often be achieved.Authors of [20] reported the design and implementation of the Privacy Integrated Queries (PINQ) platform for privacy-preserving data analysis.Complement to the Laplace mechanism, McSherry and Talwar [22] proposed the exponential mechanism, which works for any queries whose output spaces are discrete.This enables differentially private solutions for various interesting problems where the outputs are not real numbers.
The second category studies new differentially private methods with improved accuracy, without compromising privacy preservation.Privelet [33] was developed as a data publishing technique that ensures -differential privacy while providing accurate answers for range-count queries, i.e., count queries where the predicate on each attribute is a range.The core of their solution is a framework that applies wavelet transforms on the data before adding noise to it.The technique in [1] was designed for releasing marginals, i.e., the projections of a frequency matrix on various subsets of the dimensions.iReduct [31] was designed to compute answers with reduced relative errors.The basic idea of iReduct is to inject different amounts of noise to different query results, thus smaller (larger) values are more likely to be injected with less (more) noise.
The third category includes algorithms for enforcing -differential privacy in the publication of various types of data, such as relational tables [4,11,33], data mining results [8,21,29], and histogram publication [35].Specifically, Xu et al. [35] investigated how the counts in one-dimensional histograms can be released in a differentially private manner.Barak et al. [1] proposed a solution for releasing marginals, each of which contains the counts of pertinent to a projection of the original dataset onto a subset of its attributes.Rastogi and Nath [26] studied the publication of timeseries in a distributed setting.In some contexts (e.g., search logs [10,13,14]), besides -differential privacy, a relaxed notion of ( , δ)-differential privacy have been proposed to bound the probabilities (by δ) that the output generated from one of two neighboring inputs D and D cannot be generated from the other one (since the zero probability cannot be the denominator to be bounded by e ).Similar to -differential privacy, Mohammady et al. [23] has proposed a privacy notion -indistinguishability for different views of the outsourced network trace data.
Yang et al. [36] listed some open problems that we believe are important and deserved additional attention from researchers.The first problem is about the actual/physical meaning of privacy budget .The most relevant prior work (to ours) is [15].They consider the probability of identifying any particular individual as being in the database, and demonstrate the challenge of setting proper values of given the goal of protecting individuals in the database with some fixed inference probability.The details of their techniques are discussed in Section 3.
Recently, the differential privacy models have been extended to local differential privacy [3] in which each user locally perturbs its data before disclosing to the untrusted data recipients.The state-of-the-art LDP techniques are proposed to sanitize statistical data to generate histograms/heavy hitters [7], social graphs [25] and function frequent itemset mining [30].We intend to extend our approach for local differential privacy in the future.
Finally, privacy bounds have been studied outside differential privacy community.For instance, Zhang et al. [37] studied the privacy bound for identifying which intermediate data sets need to be encrypted and which do not in cloud computing.

Conclusion
Although the mechanism of differential privacy has received considerable attention in the past decade, few efforts have been dedicated to studying the practical implications of its given privacy bound (e.g., ) and applying it in practice.In addition, despite its apparent importance in real world, the choice of an appropriate value of (based on a required quantitative probability that any individual can be identified from the input data) has not been well studied in the literature.
Prior works suffer from some limitations.To address these deficiencies, we have presented a novel interpretive inference model to convert the differential privacy bound to the probability of identifying any individual from the input database.In addition, it is also possible to determine an appropriate value of the privacy bound from our inference model for any desired privacy guarantee (i.e., given a limited probability of identification).We have also shown that the upper bound for differential privacy suggested by prior models is too large -this makes the prior interpretive inference models vulnerable to our inferences performed by the adversaries.We have theoretically and experimentally validated the effectiveness of our model.

Figure 1 .
Figure 1.Example of Anonymization . The adversary maintains a set of tuples < w, α, β > for each possible combination w of D , with n − 1 records sampled from D (i.e., D ∈ D and |D | = |D| − 1).Consider the discussions in Section 2, we can infer that, either D = D 1 , D = D 2 or D = D 2 , D = D 1 holds.Let Ψ denote the set of all possible combinations of D (|Ψ | = n).α and β are the adversary's prior belief and posterior belief on w = D , respectively, after given a query response.

5
Exploring the Privacy Bound for Differential Privacy: From Theory to Practice EAI Endorsed Transactions on Security and Safety 12 2018 -01 2019 | Volume 5 | Issue 18 | e2 X. He, Y. Hong, Y. Chen

7
Exploring the Privacy Bound for Differential Privacy: From Theory to Practice EAI Endorsed Transactions on Security and Safety 12 2018 -01 2019 | Volume 5 | Issue 18 | e2pairs that satisfy inequality |x i − x i | < L, 1 ≤ i ≤ n as m, the probabilities of successful inferences can be measured by the ratio: m n .On the other hand, the THE is directly computed from Theorem 1.

Figure 3 .
Figure 3. Probability of Successful Inference vs. Length L

Figure 4 .
Figure 4.The Probability of Success vs. Workload

9
Exploring the Privacy Bound for Differential Privacy: From Theory to Practice EAI Endorsed Transactions on Security and Safety 12 2018 -01 2019 | Volume 5 | Issue 18 | e2