Basic Math with BigQuery Remote Functions using Homomorphic Encryption

2022-06-08

Several BigQuery Remote functions that’ll let you perform some very basic math though homomorphic encryption

From wikipedia:

Homomorphic encryption is a form of encryption that permits users to perform computations on its encrypted data without first decrypting it. These resulting computations are left in an encrypted form which, when decrypted, result in an identical output to that produced had the operations been performed on the unencrypted data. Homomorphic encryption can be used for privacy-preserving outsourced storage and computation. This allows data to be encrypted and out-sourced to commercial cloud environments for processing, all while encrypted.

meaning Alice can encrypt some numbers, give it to Bob, he can perform some basic math on it while its still under encryption and give Alice the result back which she can decrypt to see the result….

i can’t emphasize how cool that is..the math is done data that is still encrypted: Bob has no idea what those numbers are but yet still did some operations against it…

If you’re looking for an explanation on how FHE works…thats way beyond me..i’d encourage you to read the excellent blog here: A Homomorphic Encryption Illustrated Primer…which is also where i got the colorful picture.

In my case case, the FHE functions are implemented using lattigo as BigQuery Remote Functions and can do some basic stuff:

  • encrypt(integer) --> bytes Encrypt a number with a static public key encoded into the function

  • decrypt(bytes) --> bytes Decrypts some FHE encrypted bytes using a secret key encoded into the function

  • add(x bytes, y bytes) --> bytes Adds two FHE encrypted values

  • sub(x bytes, y bytes) --> bytes Subtracts two FHE encrypted values

  • mul(x bytes, y bytes) --> bytes Multiplies two FHE encrypted values

  • neg(x bytes) --> bytes Negates an FHE encrypted value

Now..you maybe asking..thats it? yeah pretty much…this has very little current practical value but I expect as this technology is developed, it’ll be shattering..

is it efficient to use BQ like this? nope, not in its current form…

This is just for amusement..this has little practical value as far as i can tell…just enjoy

If you want to jump to the good stuff, go here

where's waldo


You can find the source here


Setup

The instructions below will create the BQ Remote functions you can mix and match to do some path…

You can ofcourse create your own combined function (eg one function that does several operations eg, do N primitive operations in one remote functions)..but i’ve left that out for simplicity.

For examples about that, see

export GCLOUD_USER=`gcloud config get-value core/account`
export PROJECT_ID=`gcloud config get-value core/project`
export PROJECT_NUMBER=`gcloud projects describe $PROJECT_ID --format='value(projectNumber)'`

Now lets create a bigquery table, encrypt two numbers using an FHE public key and then insert those encrypted numbers in:

I’ve provided the public and secret keys here already for you. If you want to generate your own, the code is provided in app/main.go.

	pub, sec, err := genKey("pub.bin", "sec.bin")
	if err != nil {
		fmt.Printf("Err %v\n", err)
		return
	}

Just uncomment it to generate your own keypair. You’ll need to add your new key into the code for the functions you’ll use. For now, just use my keys

const (
	pubKeyURL = "https://raw.githubusercontent.com/salrashid123/bq_fhe/main/app/pub.b64"
	secretKeyURL = "https://raw.githubusercontent.com/salrashid123/bq_fhe/main/app/sec.b64"
)
## crate the dataset
bq --location=US mk -d \
--description "FHE dataset." fhe

bq mk \
  -t \
  --description "FHE xy" \
  --label organization:fhexy \
  fhe.xy \
  uid:STRING,x:BYTES,y:BYTES

## insert the encrypted values into the dataset

cd app/
go run main.go --projectID $PROJECT_ID

# get the connection
bq mk --connection --display_name='myconn' --connection_type=CLOUD_RESOURCE \
        --project_id=$PROJECT_ID --location=US my-connection

bq show --location=US --connection  my-connection

export BQ_CONN_SVC_ACCOUNT=`bq show --format=json --location=US --connection  my-connection  | jq -r '.cloudResource.serviceAccountId'`
echo $BQ_CONN_SVC_ACCOUNT

Load Keys as Secrets?

Normally, you would save the Public and Secret keys as environment variable based Secrets (well, not so much for the public key: that doesn’t need to be a secret-secret)

Unfortunately, the maximum size of a Secret is 65536 bytes and our keys are too large…you can find ways around this but since this is just a demo, i’m gonna stuff it into the code directly..

Encrypt

cd encrypt/

gcloud beta functions deploy fhe-encrypt  \
   --gen2   --runtime go116  --entry-point FHE_ENCRYPT \
   --region=us-central1   --trigger-http

# gcloud run deploy fhe-encrypt --source .  --no-allow-unauthenticated

export CLOUD_RUN_URL=`gcloud run services describe fhe-encrypt --format="value(status.address.url)"`
echo $CLOUD_RUN_URL

gcloud run services add-iam-policy-binding fhe-encrypt \
  --member="user:$GCLOUD_USER" \
  --role="roles/run.invoker"

gcloud run services add-iam-policy-binding fhe-encrypt \
  --member="serviceAccount:$BQ_CONN_SVC_ACCOUNT" \
  --role="roles/run.invoker"

## Remember to replace CLOUD_RUN_URL below 
bq --format=json query --dataset_id=$PROJECT_ID:fhe --location=US --nouse_legacy_sql  "
  CREATE OR REPLACE FUNCTION fhe_encrypt(x NUMERIC) RETURNS BYTES 
    REMOTE WITH CONNECTION \`$PROJECT_ID.us.my-connection\`
    OPTIONS (endpoint = '$CLOUD_RUN_URL',  user_defined_context = [('mode', 'encrypt')] )"

## test directly
curl -s  -X POST  \
  -H "Authorization: Bearer `gcloud auth print-identity-token`" \
  -H "Content-Type: application/json"  \
  -d @req_encrypt.json \
  "$CLOUD_RUN_URL" | jq '.'

The output of the curl command is just FHE encrypted bytes…

Decrypt

cd decrypt/

gcloud beta functions deploy fhe-decrypt  \
   --gen2   --runtime go116  --entry-point FHE_DECRYPT \
   --region=us-central1   --trigger-http

# gcloud run deploy fhe-decrypt --source .   --no-allow-unauthenticated


export CLOUD_RUN_URL=`gcloud run services describe fhe-decrypt --format="value(status.address.url)"`
echo $CLOUD_RUN_URL

gcloud run services add-iam-policy-binding fhe-decrypt \
  --member="user:$GCLOUD_USER" \
  --role="roles/run.invoker"

gcloud run services add-iam-policy-binding fhe-decrypt \
  --member="serviceAccount:$BQ_CONN_SVC_ACCOUNT" \
  --role="roles/run.invoker"


 bq --format=json query --dataset_id=$PROJECT_ID:fhe --location=US --nouse_legacy_sql  "
  CREATE OR REPLACE FUNCTION fhe_decrypt(x BYTES) RETURNS BYTES 
    REMOTE WITH CONNECTION \`$PROJECT_ID.us.my-connection\`
    OPTIONS (endpoint = '$CLOUD_RUN_URL',  user_defined_context = [('mode', 'decrypt')] )"

Add

cd add/

gcloud beta functions deploy fhe-add  \
   --gen2   --runtime go116  --entry-point FHE_ADD \
   --region=us-central1   --trigger-http

# gcloud run deploy fhe-add --source .  --no-allow-unauthenticated

export CLOUD_RUN_URL=`gcloud run services describe fhe-add --format="value(status.address.url)"`
echo $CLOUD_RUN_URL

gcloud run services add-iam-policy-binding fhe-add \
  --member="user:$GCLOUD_USER" \
  --role="roles/run.invoker"

gcloud run services add-iam-policy-binding fhe-add \
  --member="serviceAccount:$BQ_CONN_SVC_ACCOUNT" \
  --role="roles/run.invoker"

curl -s  -X POST  \
  -H "Authorization: Bearer `gcloud auth print-identity-token`" \
  -H "Content-Type: application/json"  \
  -d @req_add.json \
  "$CLOUD_RUN_URL" | jq '.'


 bq --format=json query --dataset_id=$PROJECT_ID:fhe --location=US --nouse_legacy_sql  "
  CREATE OR REPLACE FUNCTION fhe_add(x BYTES, y BYTES) RETURNS BYTES 
    REMOTE WITH CONNECTION \`$PROJECT_ID.us.my-connection\`
    OPTIONS (endpoint = '$CLOUD_RUN_URL',  user_defined_context = [('mode', 'add')] )"

Sub

cd sub/

gcloud beta functions deploy fhe-sub  \
   --gen2   --runtime go116  --entry-point FHE_SUB \
   --region=us-central1   --trigger-http

# gcloud run deploy fhe-sub --source .  --no-allow-unauthenticated

export CLOUD_RUN_URL=`gcloud run services describe fhe-sub --format="value(status.address.url)"`
echo $CLOUD_RUN_URL

gcloud run services add-iam-policy-binding fhe-sub \
  --member="user:$GCLOUD_USER" \
  --role="roles/run.invoker"

gcloud run services add-iam-policy-binding fhe-sub \
  --member="serviceAccount:$BQ_CONN_SVC_ACCOUNT" \
  --role="roles/run.invoker"


bq --format=json query --dataset_id=$PROJECT_ID:fhe --location=US --nouse_legacy_sql  "
  CREATE OR REPLACE FUNCTION fhe_sub(x BYTES, y BYTES) RETURNS BYTES 
    REMOTE WITH CONNECTION \`$PROJECT_ID.us.my-connection\`
    OPTIONS (endpoint = '$CLOUD_RUN_URL',  user_defined_context = [('mode', 'sub')] )"

Mul

cd mul/

gcloud beta functions deploy fhe-mul  \
   --gen2   --runtime go116  --entry-point FHE_MUL \
   --region=us-central1   --trigger-http

# gcloud run deploy fhe-mul --source .  --no-allow-unauthenticated

export CLOUD_RUN_URL=`gcloud run services describe fhe-mul --format="value(status.address.url)"`
echo $CLOUD_RUN_URL

gcloud run services add-iam-policy-binding fhe-mul \
  --member="user:$GCLOUD_USER" \
  --role="roles/run.invoker"

gcloud run services add-iam-policy-binding fhe-mul \
  --member="serviceAccount:$BQ_CONN_SVC_ACCOUNT" \
  --role="roles/run.invoker"


bq --format=json query --dataset_id=$PROJECT_ID:fhe --location=US --nouse_legacy_sql  "
  CREATE OR REPLACE FUNCTION fhe_mul(x BYTES, y BYTES) RETURNS BYTES 
    REMOTE WITH CONNECTION \`$PROJECT_ID.us.my-connection\`
    OPTIONS (endpoint = '$CLOUD_RUN_URL',  user_defined_context = [('mode', 'mul')] )"

Neg

cd neg/

gcloud beta functions deploy fhe-neg  \
   --gen2   --runtime go116  --entry-point FHE_NEG \
   --region=us-central1   --trigger-http

# gcloud run deploy fhe-neg --source .  --no-allow-unauthenticated

export CLOUD_RUN_URL=`gcloud run services describe fhe-neg --format="value(status.address.url)"`
echo $CLOUD_RUN_URL

gcloud run services add-iam-policy-binding fhe-neg \
  --member="user:$GCLOUD_USER" \
  --role="roles/run.invoker"

gcloud run services add-iam-policy-binding fhe-neg \
  --member="serviceAccount:$BQ_CONN_SVC_ACCOUNT" \
  --role="roles/run.invoker"

bq --format=json query --dataset_id=$PROJECT_ID:fhe --location=US --nouse_legacy_sql  "
  CREATE OR REPLACE FUNCTION fhe_neg(x BYTES) RETURNS BYTES 
    REMOTE WITH CONNECTION \`$PROJECT_ID.us.my-connection\`
    OPTIONS (endpoint = '$CLOUD_RUN_URL',  user_defined_context = [('mode', 'neg')] )"

Now that all the functions are deployed, lets test some stuff

View Encrypted columns

Remember we inserted encrypted data? What to see a bit of it:

bq  query --use_legacy_sql=false  "SELECT 
 LEFT(ecd1.x,20) as x, 
 LEFT(ecd1.y,20) as y
FROM $PROJECT_ID.fhe.xy  AS ecd1"

+------------------------------+------------------------------+
|              x               |              y               |
+------------------------------+------------------------------+
| AgEMAgAAAELpoiEjAAAACHt5zGI= | AgEMAgAAAHvMYbbSAAAAOXwnE1w= |
+------------------------------+------------------------------+

thats right, x and y columns are encrypted forms of the integers we inserted (which was x=3, y=2)

note, the FHE encrypted stuff is huuuuge…i’m only showing the first 20

Encrypt

Encrypt two numbers…the same two numbers as in the code we used (the encrypted bytes will be different)

bq  query --use_legacy_sql=false  "SELECT 
 LEFT(fhe.fhe_encrypt(3),20) as x, 
 LEFT(fhe.fhe_encrypt(2),20) as y"

+------------------------------+------------------------------+
|              x               |              y               |
+------------------------------+------------------------------+
| AgEMAgAAAFhwaC0JAAAAMHTHHAk= | AgEMAgAAADBFjOi9AAAAawFWgyg= |
+------------------------------+------------------------------+

Encrypt/Decrypt

Run a cycle to encrypt and then decrypt some numbers

bq  query --use_legacy_sql=false  "SELECT 
 SAFE_CONVERT_BYTES_TO_STRING(
   fhe.fhe_decrypt(
     fhe.fhe_encrypt(3)
   )
  ) as x, 
  SAFE_CONVERT_BYTES_TO_STRING(
    fhe.fhe_decrypt(
      fhe.fhe_encrypt(2)
    )
  ) as y"

+---+---+
| x | y |
+---+---+
| 3 | 2 |
+---+---+

x+y

Let add those two encrypted values..in this case its the two encrypted columns in bigquery.

bq  query \
--use_legacy_sql=false  "SELECT 
  LEFT(fhe.fhe_add(ecd1.x, ecd1.y), 20) as add
FROM fhe.xy  AS ecd1"

+------------------------------+
|             add              |
+------------------------------+
| AgEMAgAAAD62BRf0AAAAQfeg374= |
+------------------------------+

the output is also encrypted!…at no time did the function fhe_add ever know what the numbers it added!!!

decrypt(x+y)

Now lets use a function to decrypt the encrypted values above:

bq  query \
--use_legacy_sql=false  "SELECT 
  SAFE_CONVERT_BYTES_TO_STRING(
    fhe.fhe_decrypt(
      fhe.fhe_add( ecd1.x, ecd1.y )
    )
  ) as decrypt
FROM fhe.xy  AS ecd1"

+---------+
| decrypt |
+---------+
| 5       |
+---------+

we got 5…that happens to be the sum of the two integers we inserted into BigQuery

	x := flag.Uint64("x", 3, "x")
	y := flag.Uint64("y", 2, "y")

(um yeah, if it wasn’t then the point of all this was for you to watch this)

You can of course choose not to have a fhe_decrypt() and simply output the values of fhe_add() into some file or dataset and then do decrypt it offline..

decrypt( encrypt(4) + encrypt(2) )

Now encrypt two values on the fly, add them while under encryption and then decrypt it

bq  query --use_legacy_sql=false  "SELECT 
  SAFE_CONVERT_BYTES_TO_STRING(
    fhe.fhe_decrypt(
      fhe.fhe_add(
        fhe.fhe_encrypt(4),
        fhe.fhe_encrypt(2)
      )
    )
  ) as add"

+-----+
| add |
+-----+
| 6   |
+-----+

decrypt( -1 * encrypt(4) )

Pick a number, encrypt it, negate it and then decrypt it

bq  query --use_legacy_sql=false  "SELECT 
  SAFE_CONVERT_BYTES_TO_STRING(
    fhe.fhe_decrypt(
      fhe.fhe_neg(
        fhe.fhe_encrypt(4)
      )
    )
  ) as neg"

+-----+
| neg |
+-----+
| -4  |
+-----+

decrypt( encrypt(4) * y )

Multiply two encrypted number and then decrypt it. Note y is the entry that is already in BQ (y=2)

bq  query  --use_legacy_sql=false  "SELECT 
  SAFE_CONVERT_BYTES_TO_STRING(
    fhe.fhe_decrypt(
      fhe.fhe_mul(
        fhe.fhe_encrypt(4),
        ecd1.y
      )
    )
  ) as mul
  FROM fhe.xy  AS ecd1"

+-----+
| mul |
+-----+
| 8   |
+-----+

We can only mix and match here since the same public key was used in the Remote function as was used when inserting the rows into BQ

decrypt( ( encrypt(4) + x ) * y )

Add and Multiply where (x,y)->(3,2)

bq  query  --use_legacy_sql=false  "SELECT 
  SAFE_CONVERT_BYTES_TO_STRING(
    fhe.fhe_decrypt(
      fhe.fhe_mul(
        fhe.fhe_add(
          fhe.fhe_encrypt(4),
          ecd1.x
        ),
        ecd1.y
      )
    )
  ) as compound
  FROM fhe.xy  AS ecd1"

+----------+
| compound |
+----------+
| 14       |
+----------+

There it is..basic math..as for division..see this or this…the math and understanding it way, way beyond me.

The whole intent of this article is academic and for some amusement…

This site supports webmentions. Send me a mention via this form.