A BQ Remote Function that just does something simple:
for each batch of rows, a Cloud Function will calculate the HMAC-SHA256 using a passphrase provided by the function caller.
While its simple, the important thing to note is you can run pretty much any transformation on the source BigQuery data (decrypt, encrypt, encode, convert to XML, etc) using code that runs in cloud functions.
Just be careful that at BQ scale that the functions are relatively simple and quick (i,e minimize external dependencies and services inside the function!).
this feature is in private preview (as of 2/28/22
); running this tutorial requires allowlist/enablement.
You can find the source here
First create the cloud function and test
export GCLOUD_USER=`gcloud config get-value core/account`
export PROJECT_ID=`gcloud config get-value core/project`
export PROJECT_NUMBER=`gcloud projects describe $PROJECT_ID --format='value(projectNumber)'`
gcloud functions deploy hmac_sha256 \
--runtime go116 \
--entry-point HMAC_SHA256 \
--region=us-central1 \
--trigger-http
# Allow unauthenticated invocations of new function [remote_concat]? (y/N)? N
gcloud functions add-iam-policy-binding hmac_sha256 \
--member="user:$GCLOUD_USER" \
--role="roles/cloudfunctions.invoker"
Now test directly using curl just to validate the function works.
Specifically, the expected HMAC
base64 encoded value with the following plaintext and key will be
export key="key"
export plaintext="The quick brown fox jumps over the lazy dog"
echo -n "$plaintext" | openssl dgst -sha256 -hmac "$key" -hex -r | xxd -r -p | base64
97yD9DBThCSxMpjmqm+xQ+9NWaFJRhdZl0edvC0aPNg=
so, if we POST the following payload that mimics what BQ would send, we’ll see the appropriate rows hashed to what we expect:
{
"requestId": "124ab1c",
"caller": "//bigquery.googleapis.com/projects/myproject/jobs/myproject:US.bquxjob_5b4c112c_17961fafeaf",
"sessionUser": "test-user@test-company.com",
"userDefinedContext": {
"key1": "value1",
"key2": "v2"
},
"calls": [
["The quick brown fox jumps over the lazy dog", "key"],
["The quick brown fox jumps over the lazy dog", "bar"],
["foo", "bar"],
["bar", "bar"],
["foo", "bar"],
["The quick brown fox jumps over the lazy dog", "key"],
["The quick brown fox jumps over the lazy dog", "bar"]
]
}
the output will be in order of the rows sent in (i.e., we expect the hashes of (row1, row6)
to match as well as (row2, row7)
and finally (row3, row5)
curl -s -X POST \
-H "Authorization: Bearer `gcloud auth print-identity-token`" \
-H "Content-Type: application/json" \
-d @req.json \
"https://us-central1-$PROJECT_ID.cloudfunctions.net/hmac_sha256" | jq '.'
{
"replies": [
"97yD9DBThCSxMpjmqm+xQ+9NWaFJRhdZl0edvC0aPNg=",
"4Gwg9rSBeSWMLz7yltK5ujS3fXz6C1eSottgyOMajjY=",
"FHkzIYqqvAuLEKKzpcNGhMjZQ0G88QpHNtxycPd0GFE=",
"ltlEFLFUq3pBoeLIS9b6WIdM1N6LFrFrFNsuMZDVMww=",
"FHkzIYqqvAuLEKKzpcNGhMjZQ0G88QpHNtxycPd0GFE=",
"97yD9DBThCSxMpjmqm+xQ+9NWaFJRhdZl0edvC0aPNg=",
"4Gwg9rSBeSWMLz7yltK5ujS3fXz6C1eSottgyOMajjY="
]
}
…so now back to bigquery..
Insert a bigquery table with the key and plaintext
printf '%s\n' 'The quick brown fox jumps over the lazy dog' > /tmp/file.csv
gsutil mb -l us-central1 gs://$PROJECT_ID-bq
gsutil cp /tmp/file.csv gs://$PROJECT_ID-bq/
bq mk -d --data_location=US mydataset1
bq load --autodetect --source_format=CSV mydataset1.mytable gs://$PROJECT_ID-bq/file.csv plaintext:STRING
bq query --nouse_legacy_sql "
SELECT
plaintext
FROM
$PROJECT_ID.mydataset1.mytable
"
bq mk --connection --display_name='myconn' --connection_type=CLOUD_RESOURCE \
--project_id=$PROJECT_ID --location=US my-connection
bq show --location=US --connection my-connection
export BQ_CONN_SVC_ACCOUNT=`bq show --format=json --location=US --connection my-connection | jq -r '.cloudResource.serviceAccountId'`
echo $BQ_CONN_SVC_ACCOUNT
# now allow the service account to call the function
gcloud functions add-iam-policy-binding hmac_sha256 \
--member="serviceAccount:$BQ_CONN_SVC_ACCOUNT" \
--role="roles/cloudfunctions.invoker"
# create the connection
bq --format=json query --dataset_id=$PROJECT_ID:mydataset1 --location=US --nouse_legacy_sql "
CREATE FUNCTION hmac_sha256(plaintext STRING, key STRING) RETURNS STRING
REMOTE WITH CONNECTION \`$PROJECT_ID.us.my-connection\`
OPTIONS (endpoint = 'https://us-central1-$PROJECT_ID.cloudfunctions.net/hmac_sha256', user_defined_context = [('mode', 'hash')] )
"
# now use the function
bq query --nouse_legacy_sql "
SELECT
plaintext, mydataset1.hmac_sha256(plaintext,'key') as hmac_sha256
FROM
$PROJECT_ID.mydataset1.mytable
"
+---------------------------------------------+----------------------------------------------+
| plaintext | hmac_sha256 |
+---------------------------------------------+----------------------------------------------+
| The quick brown fox jumps over the lazy dog | 97yD9DBThCSxMpjmqm+xQ+9NWaFJRhdZl0edvC0aPNg= |
+---------------------------------------------+----------------------------------------------+
You can see the invocation in cloud functions logs directly:
For reference on javascript UDF, see BigQuery HMAC-SHA256 User-defined Function
Also see
This site supports webmentions. Send me a mention via this form.