This is an implementation of an algorithm to search over encrypted data.
Specifically, given two tables A and B, some join attributes A.a and B.b,
and a filter clause A.x = c_a and B.y = c_b, we would like to perform the following query
SELECT * FROM A
INNER JOIN B ON A.x = B.x
WHERE A.a IN (s_1) AND B.b IN (t_1)(for constants s_1, t_1), with minimal leakage. That is, using a (master) secret key,
we would like to encrypt the tables (i.e. their rows),
execute an encrypted query on them, resulting in the retrieval of the correct rows.
This implementation uses a function-hiding inner product encryption scheme whose code is contained in fhipe. Section 2.3 of the paper is the API specification for the functionality implemented in fhipe/ipe.py. See the fhipe project's readme for more details.
Make sure you have the following installed:
$ git clone --recursive https://github.com/EquiJoins/EquiJoinsOverEncryptedData.git
$ cd encryped-joins
$ sudo make install # ( or use `make install-mac` if running on MAC OS X)See the following list for common build errors and how to fix them.
-
Error:
charm/core/math/integer/integermodule.c:129:19: error: dereferencing pointer to incomplete type ‘BIGNUM {aka struct bignum_st}’Resolution: this is a known issue with an older version of
charm. Using the latest dev branch of thecharmrepo should resolve this (i.e.cd charmandgit checkout dev).
In hash_based_join.py, we run a hash based join
on TPC-H data (schema found here).
In the hash_based_impl/data/ folder, one can find the datasets separated by the scale factor used to generate them.
We only use the orders table and customer table. For these tables, we have appended a column selectivity
to have granular control over which rows are selected in a WHERE clause. For example, for selectivity 1/100,
there will be num_rows / 100 rows that have value 100 for the selectivity attribute.
The organization of the py submodules are as follows
- fhipe/ipe.py contains the code required to generate the vectors, ciphertext and tag
- fhipe/join.py contains the code to run the actual join query - currently it is a nested inner loop join
- fhipe/encrypt_functions.py contains the code required to encrypt a specific row or table - typically done as a preprocessing step
Typical workflow
- Load in a table from a csv file to a 2d array
- Determine the number of attributes that are in the query, the vector length is equal to the number of attributes plus 4
- Generate a secret key by calling ipe.setup(int) and passing in the vector length