APSI
APSI is a C++ library for Asymmetric (unlabeled or labeled) Private Set Intersection.
Install / Use
/learn @microsoft/APSIREADME
APSI: C++ library for Asymmetric PSI
- Introduction
- How APSI Works
- Using APSI
- Building APSI
- Command-Line Interface (CLI)
- Acknowledgments
- Questions
Introduction
(Unlabeled) PSI and Labeled PSI
Private Set Intersection (PSI) refers to a functionality where two parties, each holding a private set of items, can check which items they have in common without revealing anything else to each other. Upper bounds on the sizes of the sets are assumed to be public information and are not protected.
The APSI (Asymmetric PSI) library provides a PSI functionality for asymmetric set sizes based on the protocol described in eprint.iacr.org/2021/1116. For example, in many cases one party may hold a large dataset of millions of records, and the other party wishes to find out whether a single particular record or a small number of records appear in the dataset. We refer to this as APSI in unlabeled mode.
In many cases, however, the querier wishes to also retrieve some information per each record that matched. This can be viewed as a key-value store with a privacy preserving batched query capability. We use the terminology item and label to refer to the key and the value in such a key-value store, and call this APSI in labeled mode.
Note: Unless labeled mode is actually needed, it will be much more efficient (in both communication and computation) to use the unlabeled mode.
Sender and Receiver
We use the terminology sender and receiver to denote the two parties in the APSI protocol: a sender sends the result to the receiver. For example, in a common use case where a server hosts a look-up table that multiple clients can query with encrypted records. In this case the server acts as the sender, and the clients act as (independent) receivers.
How APSI Works
Homomorphic Encryption
APSI uses a relatively new encryption technology called homomorphic encryption that allows computations to be performed directly on encrypted data. Results of such computations remain encrypted and can be only decrypted by the owner of the secret key. There are many homomorphic encryption schemes with different properties; APSI uses the BFV encryption scheme implemented in the Microsoft SEAL library.
Computation on Encrypted Data
Microsoft SEAL enables computation representable with arithmetic circuits (e.g., additions and multiplications modulo a prime number) with limited depths rather than arbitrary computation on encrypted data. These computations can be done in a batched manner, where a single Microsoft SEAL ciphertext encrypts a large vector of values, and computations are done simultaneously and independently on every value in the vector; batching is crucial for APSI to achieve good performance.
Noise Budget
The capacity of computation that can be done on encrypted data is tracked by noise budget that each ciphertext carries. A freshly encrypted ciphertext has a certain amount of noise budget which is then consumed by computations – particularly multiplications. A ciphertext can no longer be decrypted correctly once its noise budget is fully consumed. To support computations of larger multiplicative depths, it is necessary to start with a larger initial noise budget, which can be done through appropriate changes to the encryption parameters.
Encryption Parameters
Homomorphic encryption schemes, such as BFV, are difficult to configure for optimal performance. APSI requires the user to explicitly provide the Microsoft SEAL encryption parameters. So we need to describe them here briefly. For much more details, we refer the reader to the examples in the Microsoft SEAL repository. We describe three important encryption parameters that the user should be familiar with.
plain_modulus is the easiest to understand.
It must be a prime number congruent to 1 modulo 2 * poly_modulus_degree and defines the finite field datatype that the BFV scheme encrypts.
For example, if plain_modulus is 65537 – a 17-bit prime – then the scheme encrypts integers modulo 65537, and computations on encrypted data preserves integer arithmetic modulo 65537.
A larger plain_modulus leads to faster noise budget consumption.
It is recommended to design computation with as small a plain_modulus as possible.
poly_modulus_degree is a positive power-of-two integer that determines how many integers modulo plain_modulus can be encoded into a single Microsoft SEAL plaintext; typical values are 2048, 4096, 8192, and 16384.
It is now easy for the reader to appreciate the value of batching: computation of thousands of values can be done at the cost of one computation on encrypted data.
poly_modulus_degree also affects the security level of the encryption scheme: if other parameters remain the same, a bigger poly_modulus_degree is more secure.
coeff_modulus is a set of prime numbers that determine the noise budget of a freshly encrypted ciphertext.
In Microsoft SEAL the coeff_modulus primes are rarely given explicitly by values but instead by bit counts – the library can create them.
In APSI it is necessary to specify at least two primes in coeff_modulus, but it is beneficial to have as few of them as possible; using 2 – 8 primes is probably reasonable.
The individual primes can be up to 60 bits.
The noise budget depends linearly on the total bit count of the primes.
coeff_modulus also affects the security level of the encryption scheme: if other parameters remain the same, a bigger total bit count is less secure.
Thus, to obtain more computing capability, i.e., more noise budget, one needs to increase the total bit count of the coeff_modulus, and consequently may have to increase poly_modulus_degree for security.
This will subsequently have an impact on the batching capability, so the computation itself may now change.
Fortunately, Microsoft SEAL prevents the user from accidentally setting insecure parameters.
It checks that, for the given poly_modulus_degree, the total coeff_modulus bit count does not exceed the following bounds:
| poly_modulus_degree | max coeff_modulus bit count | |---------------------|-----------------------------| | 1024 | 27 | | 2048 | 54 | | 4096 | 109 | | 8192 | 218 | | 16384 | 438 | | 32768 | 881 |
In APSI, the user will need to explicitly provide the coeff_modulus prime bit counts, so the table above will be of great help in avoiding unnecessary exceptions being thrown by Microsoft SEAL.
Theory
Naive Idea
The basic idea of APSI is as follows.
Suppose the sender holds a set {Y_i} of items – each an integer modulo plain_modulus – and the receiver holds a single item X – also an integer modulo plain_modulus.
The receiver can choose a secret key, encrypts X to obtain a ciphertext Q = Enc(X), and sends it over to the sender.
The sender can now evaluate the matching polynomial M(x) = (x - Y_0)(x - Y_1)...(x - Y_n) at x = Q.
Here the values Y_i are unencrypted data held by the sender.
Due to the capabilities of homomorphic encryption, M(Q) will hold an encryption of (X - Y_0)(X-Y_1)...(X-Y_n) which is zero if X matches one of the sender's items and non-zero otherwise.
The sender who performs computation on X – encrypted data – will not be able to know this result due to the secret key being held only by the receiver.
One problem with the above is that the computation has an enormously high multiplicative depth. It is not uncommon for the sender to have millions or even hundreds of millions of items. This would require a very high initial noise budget and subsequently very large encryption parameters with an impossibly large computational overhead.
Lowering the Depth
The first step towards making this naive idea practical is to figure out ways of lowering the multiplicative depth of the computation.
First, notice that the sender can split up its set into S equally sized parts and evaluate the matching polynomial independently on each of the parts, producing S results {M_i(Q)}.
All of these results must be sent back to the receiver, so the sender-to-receiver communication has increased by a factor of S.
Nevertheless, this turns out to be a really valuable trick in helping reduce the size of the encryption parameters.
The second step is to use batching in Microsoft SEAL.
Per each of the S parts described above, the sender can further split its set into poly_modulus_degree many equally sized parts, and the receiver can batch-encrypt its item into a single batched query ciphertext Q = Enc([ X, X, ..., X ]).
Now, the sender can evaluate vectorized versions of the matching polynomials on Q, improving the computational complexity by a factor of poly_modulus_degree and significantly reducing the
