This repo demonstrates generating a BQ table using a .proto
as the schema and then inserting data into that table using the BQ Storage Write API.
Data from the client will be in protobuf format and we will use the AppendRowsRequest.ProtoData function marshall and insert the source data.
This repo heavily lifts code from the two repos shown below but the variation described here is that we will define the schema directly through a protoc
compiler.
References:
You can find the source here
The first step is to convert our sample .proto
to BQ shema format.
given the following proto,
syntax = "proto3";
package echo;
option go_package = "github.com/salrashid123/grpc_wireformat/grpc_services/src/echo";
message EchoRequest {
string first_name = 1;
string last_name = 2;
Middle middle_name = 3;
}
message Middle {
string name = 1;
}
message EchoReply {
string message = 1;
}
We will first use the protoc plugin protoc-gen-bq-schema to help convert it.
so first install the plutgin (you’ll ofcourse need golang
and protoc
setup on your system)
go install github.com/GoogleCloudPlatform/protoc-gen-bq-schema@latest
git clone https://github.com/GoogleCloudPlatform/protoc-gen-bq-schema.git
This specific compiler uses annotations overrides so we will add in the imports and annotations
syntax = "proto3";
package echo;
option go_package = "github.com/salrashid123/grpc_wireformat/grpc_services/src/echo";
import "bq_table.proto";
import "bq_field.proto";
message EchoRequest {
option (gen_bq_schema.bigquery_opts).table_name = "echorequest";
string first_name = 1;
string last_name = 2;
Middle middle_name = 3;
}
message Middle {
string name = 1;
}
message EchoReply {
string message = 1;
}
Now generate the schema:
cp protoc-gen-bq-schema/bq_field.proto .
cp protoc-gen-bq-schema/bq_table.proto .
protoc --go_out=. --bq-schema_out=. --go_opt=paths=source_relative --descriptor_set_out=echo/echo.pb echo/echo.proto
You should see the format BQ expects
$ cat echo/echorequest.schema
[
{
"name": "first_name",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "last_name",
"type": "STRING",
"mode": "NULLABLE"
},
{
"name": "middle_name",
"type": "RECORD",
"mode": "NULLABLE",
"fields": [
{
"name": "name",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
]
Now create a BQ table with that schema
export GCLOUD_USER=`gcloud config get-value core/account`
export PROJECT_ID=`gcloud config get-value core/project`
export DATASET_ID=echo_dataset
export TABLE_ID=echorequest
export PROJECT_NUMBER=`gcloud projects describe $PROJECT_ID --format='value(projectNumber)'`
bq mk -d --data_location=US $DATASET_ID
bq mk -t --description "prototable Keys" $DATASET_ID.$TABLE_ID echo/echorequest.schema
Use the writeAPI to insert a row of data where the source data is actually a full go protobuf Message
go run bq_client.go --projectID $PROJECT_ID --datasetID $DATASET_ID --tableID $TABLE_ID
If you query the table now
bq query \
--use_legacy_sql=false "SELECT *
FROM $PROJECT_ID.$DATASET_ID.$TABLE_ID
LIMIT 5;"
+------------+-----------+--------------+
| first_name | last_name | middle_name |
+------------+-----------+--------------+
| sal | mander | {"name":"a"} |
+------------+-----------+--------------+
and just for amusement,
This site supports webmentions. Send me a mention via this form.