BigQuery Client-side Encryption using AEAD

2022-01-13

Sample that encrypts data client side and uses BigQuery streaming insert such that the data is end-to-end encrypted and is compatible with the built in BQ AEAD functions.

This sample encrypts some data using a user-provided Tink AEAD AES-GCM keyset locally before inserting the data. The data will remain in bigquery in encrypted form but can be queried directly using these BQ built in functions given a direct parameter reference or a lookup table reference

In this context, a direct parameter has the distinct advantage of not appearing in AuditLogs…(meaning the raw encryption key is never persisted within GCP).

This sample also shows how users can create an AEAD Tink Keyset from your own raw AES key. In other words, if you already have a raw aes gcm encryption key, you can convert it into a Tink KeySet and then use that keyset to encrypt data.


Note, this article is an extension of one i wrote earlier here:

Importing and extracting external keys for BigQuery AEAD Tink KeySets

Setup

The following creates a raw AES GCM Tink key from scratch, uses that to encrypt some data and finally insert into BQ via streaming API

BQ cli is then used to query that data and provide the TINK RAW key as a query parameter (there’s a reason thats useful as described in the appendix)

Create Key

You can create a new TINK key as shown here if you already have and existing raw aes GCM key or just ask BQ to create a new offline using KEYS.NEW_KEYSET

The following snippet to create a new key is from my repo here TINK Samples

$ git clone https://github.com/salrashid123/tink_samples.git
$ cd tink_samples
$ go run external_aes_gcm/main.go 

2022/01/13 10:00:28 Raw Key: 6368616e676520746869732070617373776f726420746f206120736563726574
2022/01/13 10:00:28 crypto.cipher.AEAD.Seal() HoQG3YzqNJIsdw9x4txt7aSWfTLwqpYDBlZsYXbfSP1Z
2022/01/13 10:00:28 crypto.cipher.AEAD.Unseal() Greed
2022/01/13 10:00:28 Tink subtle decrypted Greed
2022/01/13 10:00:28 Recreated Tink RawKey: 6368616e676520746869732070617373776f726420746f206120736563726574
2022/01/13 10:00:28 Tink Keyset Encoded:  CMKIrNYJEmQKWAowdHlwZS5nb29nbGVhcGlzLmNvbS9nb29nbGUuY3J5cHRvLnRpbmsuQWVzR2NtS2V5EiIaIGNoYW5nZSB0aGlzIHBhc3N3b3JkIHRvIGEgc2VjcmV0GAEQARjCiKzWCSAB
2022/01/13 10:00:28 Tink Keyset:
 {
	"primaryKeyId": 2596996162,
	"key": [
		{
			"keyData": {
				"typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
				"value": "GiBjaGFuZ2UgdGhpcyBwYXNzd29yZCB0byBhIHNlY3JldA==",
				"keyMaterialType": "SYMMETRIC"
			},
			"status": "ENABLED",
			"keyId": 2596996162,
			"outputPrefixType": "TINK"
		}
	]
}

The raw aes key in the example above is just bytes("change this password to a secret") (yeah, thats the actual key).

That key gets converted to a TINK keyset thats shown above in raw and JSON formats. We will use this key to encrypt and decrypt the data

Create Table

Just create a simple table that has the following columns

$ bq query --nouse_legacy_sql 'SELECT ddl FROM aead.INFORMATION_SCHEMA.TABLES WHERE table_name="encrypteddata"'

+-------------------------------------------------------+
|                          ddl                          |
+-------------------------------------------------------+
| CREATE TABLE `your_project_id.aead.encrypteddata` |
| (                                                     |
|   customer_id INT64 NOT NULL,                         |
|   encrypted_animal BYTES                              |
| );                                                    |
+-------------------------------------------------------+

Encrypt data and upload

Now run the program that will tink encrypt the data client side and upload it to BQ.

Note that main.go has the raw key from above statically defined. If you created your own keyset, specify that in through the tutorial

$ go run main.go

2022/01/13 10:23:05 Tink Keyset: {
	"primaryKeyId": 2596996162,
	"key": [
		{
			"keyData": {
				"typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
				"value": "GiBjaGFuZ2UgdGhpcyBwYXNzd29yZCB0byBhIHNlY3JldA==",
				"keyMaterialType": "SYMMETRIC"
			},
			"status": "ENABLED",
			"keyId": 2596996162,
			"outputPrefixType": "TINK"
		}
	]
}
2022/01/13 10:23:05 Tink Encrypted: AZrLBEJQeJR6zkYrQYBTKfVN7zVNKekyeXgUYiqCO9T7bfHdx5w=

Notice the data is encrypted (and yeah, it matches the encrypted control data from above)

$ bq  query --use_legacy_sql=false  'SELECT
  *
FROM your_project_id.aead.encrypteddata AS ecd1;'

+-------------+------------------------------------------------------+
| customer_id |                   encrypted_animal                   |
+-------------+------------------------------------------------------+
|          32 | AZrLBEJQeJR6zkYrQYBTKfVN7zVNKekyeXgUYiqCO9T7bfHdx5w= |
+-------------+------------------------------------------------------+

Query using Paramter

You can now query this data as a parameter or specify the raw keyset into an ACL controlled BQ table (see Conditionally unmask columns in BigQuery using Authorized UDFs)

In the following, we are specifying the encryption key directly

bq  query \
--parameter=keyset1::CMKIrNYJEmQKWAowdHlwZS5nb29nbGVhcGlzLmNvbS9nb29nbGUuY3J5cHRvLnRpbmsuQWVzR2NtS2V5EiIaIGNoYW5nZSB0aGlzIHBhc3N3b3JkIHRvIGEgc2VjcmV0GAEQARjCiKzWCSAB \
--use_legacy_sql=false  'SELECT
  ecd1.customer_id as ecd1_cid,
  AEAD.DECRYPT_STRING(FROM_BASE64(@keyset1),
   ecd1.encrypted_animal,
   "somedata"
 )
FROM your_project_id.aead.encrypteddata AS ecd1;'

+----------+-------+
| ecd1_cid |  f0_  |
+----------+-------+
|       32 | Greed |
+----------+-------+

There…we have the raw data back…

To note, if we issue a BQ query and specify parameters, the encryption key is NEVER shown in BQ auditlogs and is not known by anyone. This has the advantage of enabling some untrusted multiparty compute features described below.


Untrusted Multiparty compute

BQ can provide a form of Untrusted Multiparty (and end-to-end-ish encryption). What is meant by that is suppose you have two parties Alice and Bob that have each some sensitive data they own in BQ using their AEAD key.

Carol runs an a VM which ONLY runs code that both Alice and Bob trust (i.,e if all that vm does is run that code and nothing else, both a parties are ok with it processing their sensitive data and aggregating results (eg, bloomfilters, etc)).

If Carol can demonstrate to both Alice and Bob at runtime that such a VM exists and she herself cannot access the runtime environment (eg. cant’ memory dump the vm, see its logs, ssh to the VM), then both parties are willing to share their AEAD key with that VM only.

Once these keys are shared, the VM Carol runs can issue a query like the following:

bq  query \
--parameter=keyset1::CNXd6toHEmQKWAowdHlwZS5nb29nbGVhcGlzLmNvbS9nb29nbGUuY3J5cHRvLnRpbmsuQWVzR2NtS2V5EiIaIDnvckhhXod0bLVuBGIrCHiCl14aUkBeN8MaBiimapLZGAEQARjV3eraByAB \
--parameter=keyset2::CMGf8+oBElQKSAowdHlwZS5nb29nbGVhcGlzLmNvbS9nb29nbGUuY3J5cHRvLnRpbmsuQWVzR2NtS2V5EhIaEEsyGx9asPsX8lEVldMl7QQYARABGMGf8+oBIAE \
--use_legacy_sql=false  'SELECT
  ecd1.customer_id as ecd1_cid, ecd2.customer_id as ecd2_cid,
FROM mineral-minutia-820.aead.EncryptedCustomerData AS ecd1,
     fabled-ray-104117.aead.EncryptedCustomerData AS ecd2
WHERE AEAD.DECRYPT_STRING(FROM_BASE64(@keyset1),
  ecd1.encrypted_animal,
  "somedata"
) = AEAD.DECRYPT_STRING(FROM_BASE64(@keyset2),
  ecd2.encrypted_animal,
  "somedata"
);'

Notice tht the query joins both Alice and Bob data at runtime to find common fields. The output can be a number that Carol outputs that says “ok, alice and bob’s tables have N matching fields”….At no time any party know about each others decryption keys (since the parameter keys are never saved or logged).

Users can extend this model with BQ AEAD further and incorporate GCS or PubSub into this untrusted model where the keys are released after the fact ….use the samples below but note that these are not supported by google!

Source

This is the source code described above.

  • main.go

This site supports webmentions. Send me a mention via this form.



comments powered by Disqus