Privacy-Preserving Multi-Party Directory Services

In the era of big data, the data-processing pipeline becomes increasingly distributed among multiple sites. To connect data consumers with remote producers, a public directory service is essential. This is evidenced by adoption in emerging applications such as electronic healthcare. This work systematically studies the privacy-preserving and security hardening of a public directory service. First, we address the privacy preservation of serving a directory over the Internet. With Internet eavesdroppers performing attacks with background knowledge, the directory service has to be privacy preserving, for the compliance with data-protection laws (e.g., HiPAA). We propose techniques to adaptively inject noises to the public directory in such a way that is aware of application-level data schema, eﬀectively preserving privacy and achieving high search recall. Second, we tackle the problem of securely constructing the directory among distrusting data producers. For provable security, we model the directory construction problem by secure multi-party computations (MPC). For eﬃciency, we propose a pre-computation framework that minimizes the private computation and conducts aggressive pre-computation on public data. In addition, we tackle the systems-level eﬃciency by exploiting data-level parallelism on general-purpose graphics processing units (GPGPU). We apply the proposed scheme to real health-care scenarios for constructing patient-locator services in emerging Health Information Exchange (or HIE) networks. For privacy evaluation, we conduct extensive analysis of our noise-injecting techniques against various background-knowledge attacks. We conduct experiments on real-world datasets and demonstrate the low attack success rate for the protection eﬀectiveness. For performance evaluation, we implement our MPC optimization techniques on open-source MPC software. Through experiments on local and geo-distributed settings, our performance results show that the proposed pre-computation achieves a speedup of more than an order of magnitude without security loss.


Introduction
In the era of big-data, personal data is produced, collected and consumed in digital forms, bringing unprecedented convenience to the society.As data production and consumption are distributed at different sites, sharing person-specific data over the Internet producer-location information, and connects data consumers and producers.For instance, in electronic healthcare, HIE or Healthcare Information Exchange is an emerging data-sharing platform [4,9] where the directory called locator service [1,5,8,11] helps a doctor (data consumer) find the electronic medical records (EMR) of a visiting patient (data producer).The data-location information ("which hospitals a patient has visited") may reveal privacy-sensitive facts; for instance, knowing that a celebrity visited a rehabilitation center, one can infer that s/he may have a drug problem.
A naive way of constructing the directory is for any data producer to directly publish its list of associated people (e.g., the list of patients having visited a hospital).However, this approach discloses the private data-location information to network adversaries performing traffic analysis.This privacy disclosure leaks "identifiable information" and would violate data-protection laws (e.g., HIPAA in USA [6], EC95-46 in European Union [3] and various privacy laws in Asian countries [49]) that govern the datasharing across borders in regulatory domains.
This work aims at providing a practical public directory service over the Internet.We address two salient features: 1) privacy preservation against background-knowledge attacks, and 2) secure and efficient multi-party construction of the directory among distrusting data producers.In the following, we elaborate on the two features and formulate the research.
First, with the directory data bearing privacy and being exposed to the public, it is imperative to preserve privacy.Existing data-privacy and anonymization techniques are inadequate for preserving privacy in the public directory.Differential privacy [31] is for the use in a statistics database but is insufficient for protecting directory data where the published data retains the individual records (not some statistical digests).Data anonymization notions such as k-anonymity [59], ldiversity [59], t-closeness [45] preserve privacy at a finer granularity, but is not specific to the background knowledge attacks in domain applications.In addition, the random noises used in existing techniques are insufficient as it lacks the needed models to make the noises indistinguishable (from true positives) in the presence of background knowledge.
We propose to model the background-knowledge exploited in the domain applications and accordingly select noises such that they are indistinguishable with true positive records.We propose a top-k similarity algorithm for selecting the indistinguishable noises in background knowledge attacks.Specifically, we aim at making the distribution of noises similar to that of true positives.The similarity is measured on the dimension of external, public knowledge.For instance, in HIE, the similarity between hospitals (producers) can be defined by hospital specialties and geographic locations.The top-k computation strikes a balance between the practical computation complexity and the needs to counter the background knowledge attacks of exponentially growing possibilities.While our background-knowledge modeling may not be exhaustive, our noise-selecting approach is effective in HIE and other relevant scenarios as evaluated in our study based on real-world data.
Second, we address the multi-party construction of the directory service from data producers.A defining characteristic is that data producers operate autonomously and generally do not trust each other (e.g., hospitals who compete on the same customer basis).Yet, producers need to publish their private data to construct the directory service.To guarantee the provable security, one can model the directoryconstruction problem as a secure Multi-Party Computation (MPC) problem [24,26,34,46,64] where a joint computation with inputs private to different parties is evaluated securely.A naive instantiation of the directory publication is by embedding entire publication logic in an MPC protocol, which however causes high overhead and is impractical, because of the expensive cryptographic primitives used in constructing an MPC.A conventional remedy is to identify the private part of the computation (e.g., by data-flow analysis [17,52]) and to map only this part to the MPC.Unfortunately, this approach is not effective in our problem, as the private and public data flows of the directory-construction logic are inter-tangled and separating them becomes difficult.
We tackle the efficiency of secure multi-party directory construction.We do so at both the protocol level by proposing pre-computation techniques and the systems level by exploiting data-level parallelism.Concretely, we propose an aggressive pre-computation technique that minimizes (instead of separating) the private computation for multi-party directory publication.Concretely, we conduct the pre-computation by considering all possible values of private data.It then applies expensive MPCs to a simple selection logic, that is, select from the list of pre-computed results by the actual value of private data.At the first glimpse, this optimization technique may seem counter-intuitive as the pre-computation augments the input space exponentially.In practice, particular to our directory construction problem, its effectiveness relies on the application characteristic: The public computation is usually bulky and private identity data is much smaller.For instance, achieving the privacy of t-closeness [45] entails complex computation on the public background knowledge, such as similarity/distance calculation.In addition, we propose several policies that vary in the degree of pre-computation aggressiveness.The policies To improve the system-level efficiency, we exploit the data-level parallelism native in multi-party construction and implement the pre-computation on General-Purpose Graphics Processing Units (GPGPU).We implement our design on real MPC software [24] and conduct performance evaluation in both local and geodistributed settings.Our evaluation verifies the precomputation speedup by more than an order of magnitude over the conventional approaches.Through evaluation on real-world datasets, the assurance of privacy preservation is also verified.
The contributions of this work are listed as following: • We address the privacy preserving of directory data under background knowledge attacks.We model the background knowledge and propose techniques to generate indistinguishable noises even with background knowledge.
• We analyze our noising techniques with various background-knowledge attacks.We also conduct an empirical evaluation on real-world datasets and demonstrate the effectiveness of the protection.
• We address the security and efficiency in multiparty directory construction.We propose secure multi-party pre-computation and tailor it for the directory construction.The observation is that the public background knowledge in directory publication can be isolated from expensive multiparty computation (MPC).We implement this optimization design on real MPC software.
• We propose a systems-level optimization technique for efficient directory construction.The optimization is by conducting data-parallel precomputation and by implementing it on GPGPU.
• We conduct performance evaluation and demonstrate an order of magnitude performance speedup.
The rest of the paper is organized as following: § 2 formulates the research problem.It presents background-knowledge attacks in § 3 and noisingbased protection techniques in § 4. The directory construction with efficiency is presented in § 5. Performance evaluation is presented next in § 6. § 7 surveys the related work and § 8 concludes the paper.

Research Formulation
This section presents the system and threat model, the security goals, survey of existing techniques, and preliminary on privacy-preserving data publication algorithms.

System Model
The target eco-system involves three roles: data producers, data consumers, and the host of directory service.Each data producer owns a table of personal records where each record is keyed by the identity of the owner of this record.Given a person of interest, a data consumer would want to find his/her records at all producer sites.The directory service helps the consumer "discover" relevant data producers who maintain the result records.
Formally, sharing personal records in our system works in two steps: First, a data consumer interested in a person's records poses a query to the directory service and looks up the list of producers who have this person's records.Then, the consumer contacts individual producers and locally searches the records there.In this process, the query is based on a personal identity, which we assume is known globally.In practice, this global identity can be maintained physically by an identity-management server or constructed virtually such as by patient record linkage in healthcare [40,62].

Data producers
Figure 1.System model of public directory: Two data producers share three people's records.In the directory, value one means presence and zero means absence (e.g., producer H 1 does not have gray person's records).The underscored one in red is a false positive in the sense that producer H 2 does not have the record of the white person but the directory records the opposite (for the sake of privacy preservation).
We assume each data producer locally has a dataprotection mechanism in place (e.g., user authentication and authorization) that prevents an external party from accessing the records without the data owner's consent.Figure 1 illustrates the abstract model of our system.The model is applicable to data-sharing applications in regulatory domains; A concrete scenario is about sharing patient electronic medical records (EMR) in healthcare information exchange networks, where data producers are hospitals, personal data are patients' EMRs and consumers can be physicians diagnosing patient.The details of the scenario will be elaborated in § 2.4.
The target computation of this work is about building the directory.A baseline is that each data producer sends its local access-control list to the third-party directory which enforces the access control when serving the directory requests.This baseline however becomes problematic when the directory host is untrustworthy (e.g., by third-party clouds): First, enforcing access control with integrity entails user authentication and authorization to be done by a trusted party.Second, the local access-control list reveals the binding between a person and her data producers, which can be privacy-sensitive in many applications.For instance, in Healthcare scenarios, the binding between a patient and a rehabilitation center can reveal that this person may have a drug problem.Even when the directory is protected by the host, an adversary can easily recover the binding by performing network traffic analysis and extracting this information from the side-channel of the consumer access trace.
The life cycle of the public directory is of two stages: serving consumers' online requests and being constructed from multiple data produvers.In the following, we present the security/privacy goals respectively in the two stages.

Privacy of Serving the Directory
The directory is served in the public with all containing data exposed.As mentioned, the directory data bears privacy and can be a target attracting adversaries.While there are existing data-privacy definitions, such as kanonymity [59], l-diversity [59], t-closeness [45], we consider the background-knowledge attacks.The attack leverages external knowledge about the data producers and personal records to distinguish the noises in the privacy notions.In § 3 we present the detailed data model and background knowledge used in attacking the public directory service.
The privacy preservation of public directory under background knowledge requires that the false positive producers are indistinguishable from true positives, such that the distribution of true positives is similar to that of false positives.What's noteworthy is that the similarity is measured on the dimension of external, public knowledge.For instance, in HIE, the similarity between hospitals (producers) can be defined by hospital specialties and geographic locations.In this work, we mainly use the notion of ǫ-privacy [61] to drive further presentation.The main idea of ǫ privacy is to bound the number of noises or false positives in the published list of producers by a percentage of ǫ.

Security of Multi-Party Directory Construction
When constructing the directory, it involves with multiple mutually distrusting producers.In our problem, a data producer runs autonomously and distrusts external parties including peer producers.Data producers get engaged in the distributed computation for publishing privacy-preserving directory where they exchange information with each other.
In the threat model, an adversary can eavesdrop all messages being exchanged during the distributed directory publication.For a producer, the adversary can be a network eavesdropper or a peer producer.Formally, this is the semi-honest model used in formulating a secure multi-party computation problem [21], where the adversary, being a participant in the computation, honestly follows the protocol execution but is curious about any data that flows through her during the execution.Multiple adversaries may collude.Given a network of n producers, we consider the collusion can be up to n − 1 peer producers.
The security goal is to assure the data security in the directory-publication process.Our security goal is to ensure perfect privacy (in an informationtheoretic sense).Informally, it means an adversary's view only depends on her input and public output.In other words, the messages exchanged in the protocol execution when the input of other parties take one value are "indistinguishable" from those when the input of other parties take another value.More formal treatment of the MPC data security can be found on classic texts [26].
Our threat model and security goal fit in the real-world requirement for policy compliance in data sharing.In many regulatory domains, a data producer has the responsibility of protecting the personal data it maintains and complying data-protection laws.For instance, HIPAA [6] states any identifiable information about a patient cannot be shared to any third-party, without the patient's consent.
Non-goals of this work include directory data authenticity, producer-site data protection, key management, etc. Encrypting data on the directory is orthogonal, as the content of directory is anyway disclosed to the adversary of network eavesdropper performing traffic analysis.

Applications: Healthcare Locator
One of motivating applications of P 3 I is the public locator service in healthcare information exchange networks (HIE).HIE is a health data-sharing network where the data is patient electronic medical records (EMR), data producers are hospitals where each patient visit results in the generation of new entries in an EMR, and data consumers are clinical doctors.A typical application scenario is effective sharing patient's EMR during a clinical visit where the doctor diagnosing a patient needs to view the relevant EMRs of the patient which are produced and stored in remote hospitals.This scenario features privacy-sensitive data.Patient EMRs are personal, privacy-sensitive documents, the sharing of which must comply HIPAA [6].Each hospital has its local information-security infrastructure in place (e.g., access control and user authentication).
A directory service, called HIE locator, can be used to facilitate the EMR sharing between hospitals and to help discovery of a patient's previous hospitals.In the normal case, the list of hospitals is discovered by the clinical doctor asking for it.However, this is error prone (e.g., the patient forgets about it) and is inapplicable in emergency (e.g., the patient is sent to hospital unconscious).The privacy-preserving directory can help automate the data discovery and complement the workflow offline to improve the quality of healthcare.Figure 2 illustrates the abstract workflow of sharing EMRs in HIE networks.In a clinic scenario, Alice, the patient, is seeing a physician (data consumer) who interacts with HIE network (directory) to locate the hospitals Alice visited before (data producer).In real HIE applications, the locator service runs healthcare software (e.g., OpenEMPI [11]) and is hosted by Amazon AWS alike public clouds.The public clouds are not trustworthy and it entails the use of our privacy-preserving directory protocol for publishing the HIE locator.Concretely, the life cycle of an EMR, including the data-sharing flow, can be divided into three stages: 1) EMR production where Alice's EMRs are generated or updated to reflect her clinical visit; here we assume Alice has given consent on delegating the EMR to the "producer" hospitals.2) Locator (periodical) publication where the EMR updates are published to the public directory of HIE locator in a privacy-preserving fashion.This is when our directory publication protocol is being invoked in the overall HIE workflow.3) Locator service where the locator serves the physician's request to locate Alice's producer hospitals (3.1) and find the EMRs of interest there (3.2).In particular for stage 3.2, after the physician obtains the list of potential hospitals (including both true and false positive ones), he will contact each hospital and find EMRs by going through the local user authentication and access control there.

Attack Framework
Background knowledge: In this work, the background knowledge (B) includes patient demographic information (e.g., age, gender, home address) and hospital profiles (e.g., specialties and location).Specifically, we represent the profile of each hospital by two metrics, a specialty vector and its geographic location (e.g., longitude and longitude).The specialty vector is a vector of ranking scores of the hospital in all specialty categories.The background knowledge we consider in this work is realistic and can be obtained from public data sources; for instance, the hospital profile in terms of location and specialties is public information available on the USNEWS website [7].And patient demographic information is from various online census datasets [10].
Defense by noising: P 3 I.Query(p) presents false positive hospitals, serving as noises,1 to obscure the identities of true positive hospitals which are privacysensitive to patient p.The definition of true/false positive and negative hospitals are the following.
Definition: For patient p, a hospital that she visited is defined to be a truly positive hospital, denoted by T P. The set of all true positive hospitals is denoted by A hospital that the patient has never visited before is defined to be a negative hospital, denoted by N .
Definition: In the P 3 I, a noise hospital is a hospital which the patient did not visit but the P 3 I claims that the patient visited.A noise hospital is a false positive and denoted by FP.A hospital that appears positive in the P 3 I can be a true positive or a false positive, and the set of positive hospitals is denoted by I = {P} = {T P} ∪ {FP}.ǫ-privacy goal: Given the EMR location of a patient to a list of positive hospitals, one type of information leakage that is inevitable to achieving 100% search recall is that the adversary knows "all the true-positive hospitals are in the P 3 I.Query result."Beyond that, we assume there is no direct information leaked on a patient's visited hospitals.For instance, the adversary does not know the total number of hospitals visited by a patient.Our privacy-preservation goal is to achieve ǫprivacy for all considered attacks: Definition: Given an attack that makes a probabilistic claim, ǫ-privacy is defined to be that the success rate of the attack is statistically upper bounded by ǫ.
Background knowledge attacks: The information flow of an attack is that the adversary can use the publicly available P 3 I to "reversely" infer the truepositive EMR location I 0 , and then from I 0 (or I in some cases) infer the sensitive disease information of

Concrete Attacks
Under the above attack model, we consider three specific attacks.They are classified by attack information flow and background knowledge (as in Table 2).Attack I does not rely on any background knowledge and aims at recovering true positive hospitals in step a. Attack II exploits background knowledge on the hospital specialty and aims at inferring patient diseases in step b.Attack III exploits various background knowledge on hospital and patient profile and aims at recovering true positive hospitals in step a.For different attacks, we present different top-k policies and analyze how the same ǫ-privacy assurance is achieved by P 3 I.Step a patient demographic info Table 3. Modeling P 3 I data and background-knowledge: This table describes a scenario that involves one patient and five hospitals that appear to be positive in P 3 I, h 1 , . . ., h 5 .We consider two cases: one that all five hospitals are true positives, and one that among the five, h 2 and h 3 are the true positives.Background knowledge about hospital specialty implying patient gender, and geographic distance between hospital and patient home is presented.We also show the non-matching scores (m e and m f ) on different background knowledge.Attack I.In Attack I, the adversary randomly picks one hospital from {h 1 , . . ., h 5 } without any external knowledge, and makes a claim that the patient visited the hospital.The claim, if it's true, leaks the sensitive information (knowing a patient visits a rehabilitation center discloses her drug addiction problem).As illustrated by case 1 in Table 3, when all five hospitals are true positive, the claim is always true and any type-I attack always succeeds.
Attack II.With the background knowledge of hospital specialties, the adversary can infer the health condition of the patient.The attack follows the information flow: The attack is successful when all positive hospitals end up with few specialties.Consider the extreme case that all positive hospitals are of the same type, say "rehabilitation centers".Then, no matter which hospital is true positive, the adversary can be certain that the patient must have an addiction-related problem.Note that this is different from attack I where the hospital in the claim must be true positive to have the attack to succeed.
Attack III.In Attack III, the adversary takes on step a to distinguish noise and true-positive hospitals by exploiting the knowledge on patient and hospital profiles.The patient profile in consideration is her demographic information such as home address, gender, age groups, etc.And the hospital profile includes the hospital's specialties, location, etc.
The attack works by the common knowledge on linking patients and hospitals.For instance, given a male patient, the adversary can easily determine that a woman's health center showing up in a P 3 I search result must be a false positive.Likewise, to a teenage patient, a hospital specialized in geriatrics is unlikely to be true positive.A hospital in the New York State is probably a false positive to a patient living in the State of Georgia.In general, the "non-matching" relationship through background knowledge can assist to reveal the identity of a noise hospital, thus improving the attack success rate.We formulate the relationship by non-matching score m.
Definition: Given the background knowledge B on patient p and negative hospital N , the non-matching score m B (p, N ) measures the unlikelihood that the patient has visited the hospital.The non-matching score can also be expressed between a true positive hospital P and a negative hospital N , as m B (P, N ).
Depending on the application scenarios, the nonmatching score can take various values.
• Exact-match: The non-matching score takes a binary value, indicating whether the hospital matches the patient.In the previous example involving patient gender and hospital specialty, the non-matching score is 1 (0) when a male (female) patient and a woman's health center are considered.The implication of the non-zero score is that the woman's health center should not be chosen as a noise for a male patient.
• Fuzzy-match: The non-matching score takes values that are continuous.In the previous example involving a New York hospital and a patient in Georgia, the non-matching score is measured by the geographic distance between the two.The intuition is that the more distant a hospital is to the patient's location, the less likely there is a match between the two.The implication is that a hospital too far away from a patient should not be chosen as a noise for the patient.

Attack Mitigation with Centralized Directory Construction
We now describe P 3 I.Construct({∀H}), that is, how P 3 I is constructed with the required level of noises.In this section, we consider the centralized P 3 I construction by assuming a hypothetical authority that exists and is trusted by all hospitals.We will remove this assumption and present the secure and distributed P 3 I construction in the next section.Asymmetric Deterministic Response: In the P 3 I construction, we allow a negative hospital to be published as a false positive.On the other hand, we do not allow false negatives, that is, a true positive hospital will always be published as positive.This rule, illustrated in Formula 1, leads to 100% search recall 2  and prevents any legitimate hospital from escaping the search result.
We call this publication primitive by asymmetric deterministic response (or ADR), reminiscent of the classic randomized response [63].Comparing the randomized response, our ADR is asymmetric in that it treats binary input (N or P) differently or "asymmetrically", and is deterministic in that it flips input 0 based on certain deterministic conditions (described by top-k policies in § 4.1).
P, chosen as noise by Algo. 1 N , otherwise

ADR(P) → P
Top-k Algorithm: Given a patient whose location information to hospitals is I 0 (i.e. the list of visited hospitals), the P 3 I construction problem boils down to noise generation, that is, properly choosing a certain number of false positive hospitals.To a specific patient, the selection favors negative hospitals that may or may not be similar to the set of true positive hospitals (as will be discussed in § 4.1).We thus define a hospitalto-hospital-set distance, D(N , I 0 ), which measures the dis-similarity between a negative hospital N and the set of all true positive hospitals of a patient, I 0 .The selection stops when certain condition is met.The top-k algorithm is illustrated in Algorithm 1. Attack-I mitigation: • Top-k stop condition: • Distance definition: D I (N , {P}) = 1 Proposition: P 3 I can mitigate Attack-I with assured ǫ-privacy.
Proof.From the top-k stop condition in Equation 1, we can have the following.FP ≥ T P(ǫ −1 − 1) Given Attack I follows the information flow I a → I 0 , the success rate is: Pr({T P}∪{FP}) = T P T P+FP ≤ ǫ ǫ-privacy thus holds.ǫ-Privacy under Attack II.Straw-man by l-Diversity: Attack II might be mitigated by l-diversity [48] which in the P 3 I context works by making a patient's diseases "anonymous" among l alternative diseases.However, l-diversity does not automatically lead to ǫ-privacy: While the former has restricts the number of different specialties, the latter restricts the number of negative specialties. 3 ǫ-Privacy Assurance The intuition of the protection is to choose enough false positive hospitals such that the false positive specialties suffice to bound the rate that the adversary can successfully pick a true-positive 3 To be more specific, consider a counterexample against adopting l-diversity in P 3 I.In Table 3 (case 1), 3-diversity, is already there without any noises.Because all true-positives have totally three specialties, that is, Cancer, Rehab, Woman's center.However, given no noises, the success rate of Attack II can be as high as 100%, leading to a situation that achieves 3-diversity yet no protection in the sense of ǫ-privacy.
specialty.Formally, the top-k policy that assures ǫprivacy under Attack II is described below.Here, FP s (T P s ) denotes the number of false (true) positive specialties.A false positive specialty is a disease that a patient does not have but there is at least one positive hospital that is specialized in.
Attack-II mitigation: • Top-k stop condition: • Distance definition: The distance D I I between a negative hospital N and a set of positive hospitals {P i } is defined in Equation 4. Here, S(•) denotes the vector of specialties of a hospital.We use hamming distance to capture the difference (\) between two specialty vectors.The distance definition favors the noises with different specialties from the true positive specialties.Thus, the number of false positive hospitals needed can be minimal, resulting in better search precision and performance.
Proposition: P 3 I can mitigate Attack-II with assured ǫ-privacy.
Proof.From the top-k stop condition in Equation 3, we can have the following.
Attack II follows the information flow I, B a → disease and is about recovering true positive specialties T P s from the false positive ones FP s .Then, the success rate is: ≤ ǫ Thus, the success rate is bounded by ǫ, hence ǫprivacy.
ǫ-Privacy under Attack III (Exact-match).We use the following top-k policy to mitigate Attack III with exactmatch semantics.
Attack III mitigation (exact-match): • Top-k stop condition: • Distance definition: Proof.Given Attack III follows the information flow I, B a → I 0 , the success rate is: Here, we consider two types of false positive hospitals, the matching ones with zero-valued score (FP 0 ) and non-matching ones with one-valued score (FP 1 ).FP 1 can be distinguished by the adversary with background knowledge and thus should be discarded in accounting the success rate.
Our distance in Equation 7 is defined in such a way that any non-matching hospitals would lead to a distance of an infinitely large value, and thus will not be chosen by Algorithm 1.In other words, there are no non-matching hospitals that can be chosen as noises, that is, Combining Equation 6, 8, 9, we can arrive at ǫprivacy: Pr(I 0 |I, B) ≤ ǫ ǫ-Privacy under Attack III (Fuzzy-match).We use the following top-k policy to mitigate Attack III with fuzzymatch semantics.
Attack III mitigation (fuzzy-match): • Top-k stop condition: ∀{T P} m B (T P) ∀P∈{FP}∪{T P} m B (P) ≤ ǫ (10) • Distance definition: Under Attack III with fuzzy matching, what a rational adversary would do is to bias the attack towards positive hospitals with a small non-matching score.Specifically, we consider the adversary pick a positive hospital with probability inversely proportional to the non-matching score. 4In the previous example about a Georgia patient, the adversary would avoid choosing the New York hospital due to its high non-matching score (or long geographic distance).
Proposition: P 3 I can mitigate Attack-III with assured ǫ-privacy in the sense of fuzzy-match semantic.
Proof.To the rational adversary, the success rate can be modeled by Equation 12.5 The intuition of Equation 12 can be best illustrated by the Georgia patient example.Assuming there are five hospitals in the P 3 I search result, and the distances of the five hospitals to the patient home are 2.5, 2, 1, .25, .2 as in Table 3. Considering case 2 where there are two true positives, h 2 and h 3 , the success rate following the calculation in Equation 12is Plugging Equation 10 into Equation 12, we arrive at the ǫ-privacy: Pr(I 0 |I, B) ≤ ǫ.

Secure Multi-Party Directory Construction with Optimization
In this section, we present the secure directory publication and the optimization techniques based on pre-computation.The general idea is to abstract the computation at different levels and precompute the computation at a specific level.This way, we present a series of precomputation techniques (in § 5.3, § 5.4) that vary in their aggressiveness.To start with, we present the naive approach based on multi-party computation (MPC) without precomputation.We first introduce the background on MPC.

Preliminary: Multi-Party Computation
In our protocol, we make use of existing multi-party computation (MPC) protocols whose background is presented here.In general, the purpose of MPC is to evaluate a function whose inputs are provided by different parties.Each input is private to its provider party.The protocol of MPC ensures that it does not leak any information about the private inputs even when the computation states are exchanged and shared.Different computational models exist in MPC, including circuit and RAM.After decades of studies, there are a variety of MPC protocols realizing different computation models, specialized for different network scales (for two, three or many parties).In particular, the protocol of GMW [34] is for multi-party, Booleancircuit based MPC that is constructed based on the primitives of secret-sharing and oblivious transfers.The protocol of multi-server Private-Information Retrieval (ms-PIR) [36,41]  MPC causes high overhead, mainly due to the "data -oblivious" representation of the computation and cryptographic primitives being used in the construction.For more-than-three party computation, the use of secret sharing also cause high overhead as the shares need to be broadcast in the entire network.This unscalability (in data and network sizes) makes it challenging to apply MPC for real-world distributed applications.
In practice, the common way MPC is used for many-party distributed applications is based on the "outsourcing" paradigm.That is, given multiple input parties, the GMW protocol distributes the input shares to a small number of computing parties (e.g., three parties as in the Sharemind system [22]).The data security heavily relies on the non-collusion assumption of the computing parties.In our work, we deem this outsourcing model unsuitable for the target application.In HIPAA, a hospital cannot share patient data with any third-party entity without patient consent.Therefore, our problem considers each input party as computing party and the MPC protocol needs to run directly on a medium or large network.

MPC-based Publication
Privacy-preserving directory publication is an MPC problem as the input data are spread across multiple producers and are private to them.The naive way to realize directory publication is thus to place the computation as in List 1 into the MPC; this approach is denoted by M 0 .Given the circuit representation of MPC program, the algorithm in List 1 can be easily converted to a circuit; the algorithm is a nested loop with pair-wise distance computation, and the data/control flow is essentially oblivious.In particular, we represent each producer by a vector (e.g., specialties of a hospital) and the similarity between producers can be realized by hamming distance.More complex string similarity computation is realized by dynamicprogramming based algorithms which are also data oblivious.The security of this approach inherent from that of MPC.
This MPC approach is inefficient especially in bigdata sharing scenario where there are a large amount of personal records.This is due to the expensive cryptographic primitives (e.g., oblivious transfers, etc) used in MPC protocols.To improve the performance, it relies on reducing the use of MPC in the distributed directory publication.

Full Precomputation Scheme
To reduce the use of MPC, we propose applicationlevel precomputation.Given the topK(T , S) algorithm in List 1 where only input T (the true producers) is private, we pre-compute the algorithm on the public input S and all possible values of private input T .The precomputation result is a table of results under different T values.Then, we use the actual value of T to privately look up this table and to securely retrieve the result entry.This stage can be realized in MPC using protocols such as multi-server private information retrieval (ms-PIR) [36,41].Formally, the full precomputation is to compute topK(2 S , S) where 2 S is the power set of S which includes all possible values of private T .This scheme is named M 1 .
The precomputation is effective in our directorypublication problem, provided the following characteristics.First the topK algorithm invokes some complex computation such as distance computation (i.e.Line 7 in List 1) which involve background knowledge about the producer profiles (e.g., hospital specialties and geographic locations).Precomputation avoids placing these complex computations in MPC which reduces overhead.Second, the precomputation only needs to be done once and its results can be reused for publishing different people's entries.Third, given the independence between different values, one can leverage dataparallelism to facilitate the computation.Note that the precomputation needs to be done for all possible value of T , that is, the power set of all producers; although the possibility combination grows exponentially with the number of producers, we only consider the dataproducer network is moderately large.For instance, in healthcare, a regional or statewide HIE typically consists of less than hundreds of hospitals in a consortium.
The security of precomputation relies on the fact that no private value is involves in the precomputation.Private data only occurs in the actual MPC computation.

Selective Precomputation Schemes
The full precomputation scheme considers the directory computation of topK as a whole for precomputation.In this section, we dive into the computation topK() and selectively precompute certain computationintensive parts in topK().Concretely, our selective technique considers topK consists of distance-computation at different granularity.For one, it is to pre-compute the distance between T and S − T , considering all possible values of T .This way, we have the selective precomputation, M 2 .For the other, it is to pre-compute the distance between all pairwise data producers.This yield the selective precomputation scheme, M 3 .
In M 2 , the precomputation considers all possible values of true-producer T .Given a value T * , it precomputes the set-wise distance between T * and S − T * .This produces a distance table for the subsequent MPC.In the MPC, it first follows the computation in List 1 until Line 6. Then for Line 6 to 9, it is replaced In M 3 , it precomputes the pair-wise distance matrix.That is, for any producer s 1 and s 2 ∈ S, it precomputes their distance and stores it in a table.Then, in the MPC stage, it follows the algorithm in List 1 except that the call to dist(T[i],S[j]) is replaced by a ms-PIR lookup to the precomputation table.
The security of these precomputation schemes are straightforward, as all private-data related computations are placed inside the MPC/ms-PIR protocol whose security is proven.The precomputation only considers the public data.
In summary, the topK computation for privacypreserving directory publication can be modeled as a process that issues a series of call to dist(T[i],S[j]).Our pre-computation schemes partitions this computation process at different "break" points and selectively places a certain partition to precomputation and the rest of computation into MPC/ms-PIR.Table 4 illustrates the three pre-computation schemes from this computation-partitioning perspective.

Data-Parallel Pre-Computation
The pre-computation handles multiple independent input values.There is innate data parallelism that can be exploited for better performing pre-computation.In our system, we realize it by data-parallel precomputation tasks where each task with distinct input value runs in a dedicated thread.Different threads run concurrently and without synchronization.We implement this data-parallel pre-computation framework on both multi-core CPU and generalpurpose GPU (GPGPU).Given the large number of possibilities in input values (and the simplicity of each task), GPGPU lends itself to the parallel precomputation due to its scalable execution model.
In implementation, the CPU implementation is based on pthread library [13].We pack multiple possible input values in one thread and the number of threads is twice the number of hyper-threads in hardware.The GPGPU implementation is based on CUDA library [2].In this case, the underlying NVidia-Tesla GPU has global memory of 5 GB and threads run in one grid of 65,635 blocks, each of 1024 GPU threads.This architecture allows to scale the number of threads to 2 27  and can easily handle the producer networks of more than 27 parties.

Evaluation
In this section, we study the feasibility of our technique for HIE applications in a holistic manner.Lacking benchmark dataset in existing literature, we first present a real healthcare dataset to populate the HIE data producers and locator.This sets up a target scenario for the performance study which we will present next.The purpose of performance evaluation is to answer the following question: What is the overhead of privacy-preserving directory publication?and how effective is the proposed precomputation technique in performance optimization?

USNEWS dataset
The USNEWS dataset [7] is used to model hospital profiles.The dataset considers 16 primary hospital-specialty categories, such as cardiology and rehabilitation (the entire list of specialties is shown in Table 6).For each category, a hospital is associated with a rating of three grades: "Nationally ranked", "High-performing", and "Null".We map "Nationally ranked" to value 2, "Highperforming" to value 1, and "Null" (i.e. the hospital does not have the department for this specialty) to value 0. Each hospital is associated with other profile information, such as the resident city and state.Currently, we select the dataset to include 40 top-ranked hospitals (out of 180) in the New York metropolitan area.
Open-NY Health Dataset ("Sparcs") To model patient-wise hospital visits, we use an OPEN-NY dataset, called Sparcs [14].The public dataset includes inpatient discharge records with identifiable information removed.At the finest granularity, it provides per-visit per-patient information (e.g., patient age group, gender, race, ethnicity and other de-identified information), the facility information (e.g., zip-code, name, service areas) and other pervisit information (e.g., admission type, the length of stay).Given the identifiable patient information is removed, we model the per-patient visit history by aggregating the records based on available quasiidentity information (i.e.age group, race, ethnicity, etc).

Protection Effectiveness
In our security analysis, we consider a probabilistic attacker and the ǫ-privacy assurance, which come with two limitations: one is that it only considers attacks against one specific patient, and the other is that ǫprivacy provides assurance in a statistical sense.To complement the security analysis, we move forward to measure the variance of success rate in a broader sense, that is, considering all patients.
Given the flexibility that the attacker now has in choosing which patient to attack, we consider the attacker can naturally exhaust all her options and target on the most vulnerable patient.The attacker can gauge the vulnerability of a patient by various Table 4. Partitioning topK algorithm to the precomputation-MPC framework: For notation in this table, T , S are true and all producers as in the topK() algorithm in List 1. D i for i = 1, 2, 3 are the table storing precomputation results.MPC is secure multi-party computation protocol and msPIR is a special MPC protocol for multi-server private information retrieval.metrics, such as, the one with the smallest number of specialties in her positive hospitals.Then, given the P 3 I is configured with a user-defined ǫ, we measure the actual success rates (of attacks on vulnerable patients) and report those that are larger than epsilon.In the experiment, we consider epsilon = 0.4 and 0.5.Note that we deliberately avoid to use average success rate (by multiple patients), since it is the largest success rate that makes the system vulnerable.
Overall effectiveness.We compare the case of P 3 I with alternatives, including no-protection and groupingbased PPI."No-protection" is the baseline which publishes raw location meta-data (i.e., patient-tohospital information) without any noises.Grouping PPIs [19,61] are based on the idea of K-anonymity (note to avoid confusion, we use K for K-anonymity, and k for top-k), which works by randomly grouping K hospitals together.We present our study results in Table 8.Here, for P 3 I, we use the policy of adaptively choosing top-k to achieve a constant diversity l.To make the comparison fair, we use the same budget for injecting noises; that is, the amount of total false positives in P 3 I is kept the same to that in grouping-based PPI.In the table, it is clear that P 3 I achieves significantly smaller attack success rate and number of incidents.Effectiveness of top-k algorithm.We first report all incidents with success rates higher than the userdefined ǫ = 0.5.The results are reported in Figure 3a where the x axis is the index of patients (in our processed health dataset, there are totally 280, 000 patients).It is easy to see that the no-protection approach results in much more densely distributed dots than P 3 I under various configurations of k.Furthermore, it is often the case that no-protection results in 100% success rate, implying the real-world dataset is vulnerable to probabilistic attacks when without protection.This result is consistent to Table 8 and explains the difference there.
We then manually vary the value of k to measure its effect on the attack success rate.The experiment result is presented in Figure 3b and 3c.It is interesting to see that it is not always the case that setting a larger k results in better protection; the protection in terms of larger-than-configured-ǫ incident rate is minimized at k = 6.Our preliminary inspection shows that this is relevant to the fact that real-world dataset is erroneous and does not fully match with some of our assumptions (e.g., patient does not always go to the nearest hospitals).

Performance of Directory Publication
We first conduct micro-benchmark to test the performance of data-parallel precomputation.Then, we test the overall performance of secure directory publication, with a machine of multi-core processor and in a geodistributed setting.The precomputation is implemented with data parallelism (as described in 5.5) and runs on multi-core CPU and GPGPU.We report the time to pre-compute on GPGPU and that on CPU in Figure 4.This figure also includes a baseline which is the 5% execution time of running M 0 (i.e.without any precomputation).
The performance result in Figure 4a shows that GPGPU based pre-computation is effective in reducing the execution time, and its overhead is negligible comparing the baseline.Concretely, the CPU based precomputation has its execution time to quickly surpass the baseline when the network grows over 15 parties.The GPGPU-based precomputation has much lower overhead than the baseline for any network with less than 28 parties.
For more than 28 parties, all GPU threads are occupied and it will need multiple iterations in transferring data from GPU's global memory to host memory.As a result, the GPGPU precomputation time increases exponentially, also reported in the Figure 4b (note that the y axis is of log scale).With a single GPGPU card, the precomputation time surpasses the baseline when the network is larger than about 40 parties.Here, we stress that the typical scale of a healthcare consortium is usually medium-sized (e.g., tens of hospitals and clinical centers).For nationwide healthcare systems, there may be thousands of hospitals.In this case, one can use more GPGPU cards to do the precomputation in parallel, while retaining the efficiency.
Overall Performance with MPC.The MPC-based implementation of directory publication is realized on the GMW software [24], an open-source MPC software and Percy++ [12], an open-source multi-server PIR software.We note that our precomputation protocol only relies on the general MPC and PIR interface and other MPC "backend" software can be used in our protocol.The GMW protocol exposes a circuit-based programming interface that requires MPC programmers to write a generator for Boolean circuit encapsulating the intended computation logic.At runtime, the GMW protocol runs on multiple parties where each party generates and executes the circuit by iterating through all gates in the circuit (following a topologically sorted order); for each gate, the evaluation is synchronized across all parties.The GMW protocol makes bit-wise use of two cryptographic primitives which provides the security of the protocol, that is, secret sharing [57] and oblivious transfer [54].In particular, the per-gate evaluation in GMW is to broadcast the shares of inputwire bit to all the parties in the entire network.In our application, we manually express the logic of topK algorithm in the GMW Boolean circuit, and tightly estimate the number of gates to pre-allocate so that the unused GMW circuit can be optimized out.Our GMWbased implementation consists of about 1500 lines of C++ code.

Multi-processing execution platform:
We first run our protocol on a single node with multi-processing.The machine specs are in Table 7 (the New York server).In this setting, each process represents a data producer and runs a GMW party.In the execution, each process holds a dedicated copy of the entire circuit allocated in its virtual-memory space and without shared memory.The machine has memory large enough (245 GB in total) to hold all circuit copies of the 39 parties without paging.
Results on multi-processing: To measure the performance of MPC, we used four metrics, the number of AND gates (1), end-to-end execution time (2), memory consumption (3) and communication costs (4). 1) We report the number of AND gates in the compiled GMW Boolean circuit.This metric helps evaluate the performance in a hardware-independent fashion.We only consider AND gates in a circuit and ignore other gates (i.e., XOR gates) because evaluating XOR is free (i.e.free-XOR technique [25]) and evaluating AND gates dominates the cost.2) We report the wall-clock time from launching the first process to the completion of the last process.3) We report the size of the heap memory in GMW that stores all circuit gates.It is measured by the Valgrind framework (particularly the Massif memory profiler [56]).4) We report the party-to-party communication overhead, by monitoring all outbound messages through the socket port of each process using IPTraf 6 .
In the experiment, we vary the number of parties (or data producers) and present the result in Figure 5. Figure 5a reports the result of AND gate number and Figure 5b reports wall-clock running time.They both show that the pre-computation based schemes (i.e.M 1 , M 2 , M 3 ) outperform the baseline without pre-computation.Notably, the M 1 scheme causes the best performance with a speedup of 13 times (comparing the baseline M 0 ) in the setting of 39 parties.This result demonstrates the effectiveness of pre-computation techniques that off-loads computation from the expensive MPC.In terms of memory 6 http://iptraf.seul.org/consumption in Figure 5c, M 1 and M 2 are close, reducing up to memory consumption roughly by an order of magnitude comparing M 0 and M 3 .It shows that while M 1 produces pre-computation results as additional data, its much smaller circuit (for simple lookup operation in ms-PIR) makes the overall saving of memory footprint as compared to the baseline M 0 .In Figure 5d, the communication overhead of M 1 stays to be the smallest among the four schemes, with a saving of more than 2 orders of magnitudes comparing M 0 .This is consistent with the result in the number of AND gates.
Geo-distributed execution platform: We conduct the experiment with two servers set apart more than 3000 miles (one server in the State of New York, and the other in the State of California).The bandwidth is 100 Mbps.The specification of the two servers is illustrated in Table 7.Each server runs half of the parties with multiprocessing.Different parties communicate through sockets.The precomputation runs only in one server.
Results with geo-distributed execution report the execution time of the four schemes in the geodistributed setting.The results are in Figure 6.For comparison, we include the results in the single-node setting.The execution time grows super-linearly with the number of parties in a network.For M 0 , M 2 and M 3 , running them on two geo-distributed nodes leads to longer execution time.Interestingly for M 1 , the geodistributed execution is faster than the single-node one.In this case, the performance slowdown caused by the slower communication channels is offset by the performance gain from the extra hardware (e.g., CPU) on multiple nodes.We suspect this performance result is due to that the MPC is dominated more by the local computations (on secret shares) and less by the network communications.

Privacy-Preserving Data Federation
Multi-party noise generation Distributed differential privacy [30,53,63] is proposed to support privacypreserving aggregations.The randomized response [63] provides differential privacy yet with uncontrollable noises and loss of utility.PrivaDA [32] is proposed to achieve the optimal utility and performance optimization by adopting arithmetic circuit based MPC for the noise generation.Existing multi-party noise generation takes a randomized approach and mainly targets for statistical aggregation (e.g., distributed differential privacy).This is inapplicable to our problem which features deterministic noise generation for the rigorous privacy guarantee, and needs to serve non-aggregation queries.
PPI Privacy-Preserving Index or PPI is proposed to federate and index distributed access-controlled documents [18,19] and databases (e.g., patient medical records in the HIE locator service) [61] among autonomous providers.Being stored on an untrusted server, PPI entails preserving the content privacy of all participant providers or hospitals.Inspired by the privacy definition of K-anonymity [59], existing PPI work [18,19,61] follows the groupingbased approach; it organizes providers into disjoint privacy groups of size K, such that providers from the same group are indistinguishable.However, Kanonymity, while easy to construct, does not guarantee high-quality privacy preservation.In addition, early approaches of PPI construction [47,60] are based on randomized responses [63], an iterative protocol that takes indefinite number of rounds to converge and may produce incorrect result (with certain probability).To avoid those drawbacks, ǫ-PPI combines randomized responses with a minimal use of multiparty computation to construct PPI correctly and efficiently.

MPC Frameworks and Optimization
In the last decade, practical MPC has attracted a large body of research work with a focus on programming language support and optimization [16, 20-22, 24, 37, 43, 50, 55].Practical MPCs are built on top of cryptographic protocols, such as Yao's garbled circuits [64] or GMW protocol [34], with protocol-level optimization, such as Oblivious Transfer (OT) extensions [38], or for stronger security, such as resilience with dishonest majority [27].The MPC protocols assume a circuit interface to express the computation, and practical programming support focuses on compiling a program written in a high-level language into the circuit.Existing MPC protocols and systems mainly focus on a small-scale computing that involves 2 or 3 parties.To the general MPC problem, a fundamental trade-off exists between performance and computation generality; for instance, randomized responses [63] and other techniques for privacypreserving data mining take an ad-hoc and domainspecific approach, which can be efficient at scale.By contrast, the general-purpose MPC is rather expensive.MPC Optimization High performance overhead stays to be one of the major hurdles to applying MPC in practice, which is partly caused by MPC's fine-grained use (e.g., per single bit) of expensive cryptographic primitives, and the need to transfer all possible computation results for the "obliviousness" of computation flow.Various optimization techniques are proposed to utilize the programming semantics to reduce the circuit size and depth (e.g., by using the hardware synthesis tools [28,58]) and optimize the resource utilization (e.g., justin-time compilation and pipelined execution [37,43]).Program analysis [42] is used to automatically infer privacy-sensitive data and constraints MPC only to the sensitive data.[44] conducts pre-processing on verification of MPC and results in general transformation from a passively secure protocol to an actively secure one.Our MPC optimization is currently specific to the directory construction problem, while holding the potential to apply to more generic computations.Some programming frameworks support high-level programming languages with compilers (e.g., Fairplay(MP) for SFDL [21,50], Sharemind for SecreC [39], CBMC-GC for ANSI C [33], PCF for C [43], Wysteria for a high-level typed specification language [55], PICCO for C with extension [66]), while others expose a quite low-level circuit based interface (e.g., GMW [24], JustGarble [20], OTExtension [16]); particularly both boolean circuit (e.g., GMW) and arithmetic circuit (e.g., SEPIA [23]) are considered.In addition, some advanced technique designs based on hybrid model that combines both boolean or arithmetic circuits (e.g., ABY [29], TASTY [35], Wysteria [55]).

Anonymization Definitions
Publishing public-use data about individuals without revealing sensitive information has received a lot of research attentions in the last decade.Various anonymization definitions have been proposed and gained popularity, including K-anonymity [59], l-diversity [48], t-closeness [45], and differential privacy [31].In addition, prior work [51] formally studied the information leakage under background knowledge attacks by formulating the problem using a proposed declarative language.These anonymity notions however are generally inapplicable to the PPI problemthey are mainly designed for statistic analysis or aggregation style computation where the result is global pertable data, while PPI needs to serve queries specific to individual records.r-confidentiality [65] is a privacy notion specific to the PPI problem.It assumes a probabilistic attacker on PPI and considers the increase of attack successrate with/without using the background knowledge.By contrast, our proposed ǫ-privacy considers to bound the attack success-rate (instead of the increase) which we believe provides better privacy control.

Conclusion
This work presents an MPC-precomputation framework tailored for privacy-preserving data publication for data-sharing applications.The pre-computation framework improves the performance by minimizing the private-data computation and realizing the publicdata only pre-computation in a data-parallel fashion.Several pre-computation policies are proposed with varying degrees on the aggressiveness.It is demonstrated that the proposed pre-computation scheme is applicable in real health-care scenarios.Based on real datasets and implementation on open-source MPC software, the performance study shows that the proposed pre-computation achieves a speedup of more than an order of magnitude without security loss.

2 Y.
Tang et al.EAI Endorsed Transactions on Security and Safety 01 2019 -01 2019 | Volume 6 | Issue 19 | e5 Privacy-Preserving Multi-Party Directory Services can help the optimization technique adapt to concrete scenarios with different private-data sizes.

Figure 2 .
Figure 2. Data-sharing workflow in the HIE: The figure illustrates

5
EAI Endorsed Transactions on Security and Safety 01 2019 -01 2019 | Volume 6 | Issue 19 | e5 the patient.The information flow is I[ a → I 0 ] b → disease where [] means optional.The background knowledge can facilitate the attack and can be used in two places in the attack information flow: 1. Inferring disease in step b: Knowing hospital specialties can assist in step b to infer the patient disease.The information flow by this attack is I/I 0 , B b → disease, where B represents background knowledge.2. Identifying noises in step a: The background knowledge can be used in step a to distinguish true positive hospitals from the noises.The information flow by this attack is I, B a → I 0 ( b → disease).
Listing 1: topk(I 0 ) Sensitive input I 0 : true positive hospitals visited by a patient Non-sensitive output I = {FP} ∪ I 0 : all positive hospitals {FP} ← N ULL {P} ← I 0 WHILE ( s t o p − c o n d i t i o n ( {FP} ) ) F i n d h im s .t .D(h im , {P}) = min ∀i I 0 ∪{FP} D(h i , {P}) {FP} .a d d ( h im ) {P} .a d d ( h im ) 2 Search precision in P 3 I is sacrificed for better privacy preservation.The implication of low search precision is that there are extra hospitals the record-searcher needs to contact.7 EAI Endorsed Transactions on Security and Safety 01 2019 -01 2019 | Volume 6 | Issue 19 | e5 Y. Tang et al.RETURN {FP} ∪ I 04.1.Mitigation and Security Analysisǫ-Privacy under Attack I. Recall that the topk function in Algorithm 1 exposes two methods to configure: stop condition and distance definition; we call those two topk policy.To mitigate Attack I, we use a top-k policy as below.Here, we re-use the notation of FP to denote the number of false positive hospitals.The distance is simply set to constant 1 which gives negative hospitals equal chances to be chosen as noise.

7) 8
EAI Endorsed Transactions on Security and Safety 01 2019 -01 2019 | Volume 6 | Issue 19 | e5 Privacy-Preserving Multi-Party Directory Services Proposition: P 3 I can mitigate Attack-III with assured ǫ-privacy in the sense of exact-match semantic.
is a RAM-based MPC with multiple servers interacting a client on the computation of a simple selection operation (e.g., like a database selection).9 EAI Endorsed Transactions on Security and Safety 01 2019 -01 2019 | Volume 6 | Issue 19 | e5

10Y.
Tang et al.EAI Endorsed Transactions on Security and Safety 01 2019 -01 2019 | Volume 6 | Issue 19 | e5 Privacy-Preserving Multi-Party Directory Services by a secure lookup into the precomputation table.The lookup is realized by the ms-PIR protocol as in M 1 .

Figure 5 .
Figure 5. Performance of directory publication based on precomputation and MPC

Figure 6 .
Figure 6.Geo-distributed performance on the Internet

Table 1 .
Notations P: positive hospital FP: true-positive hospital N : negative hospital T P: false-positive hospital p: patient I 0 = {T P}: true-positive hospitals

Table 2 .
Attacks: The considered attacks are classified by the type of background knowledge used in the attack and the information flow through which the adversary gets to the privacy-disclosing fact on "the patient's disease."

Table 5 .
Specialty catalog in the USNEWS dataset

Table 6 .
Specialty catalog in the USNEWS dataset

Table 8 .
Effectiveness of P 3 I