gcs

Google Cloud Storage SignedURL + Resumable upload with cURL

2017-09-12

A couple days ago a colleague asked if its possible to use Google Cloud Storage Signed URL with Resumable Uploads.

SignedURLs are pretty useful in that they allow an application to issue a time-limited URL that a customer can use to upload or download a file in Cloud Storage (GCS) without needed to login.

That is, your application can simply give a self-contained URL to a user and he/she can use that URL alone without logging into upload or download a given object.

However, what if the file/object to upload is large or your network connection is flaky. Well, in GCS you can use Resumable Uploads as described here for GCS XML and JSON APIs and their corresponding libraries.

One problem though…GCS SignedURL only works with the XML endpoint and the libraries to perform the resumable upload only speaks to the JSON endpoint.

What to do? You can ofcourse mint a signedURL and reply the protocol as shown here….its certainly very tricky to do this but this article simply shows the mechanism. Hopefully, there will be library support for this within the GCS library set as they do now for the JSON API.

I would advise against implementing the protocol…there are many cases you need to consider like parallel download and handling all the appropriate retry logic….for reference, i’ve provided a link to Gmail attachment download page.

Anyway, here is the raw protocol using curl

  1. create a file of 10⁸ bytes
base64 /dev/urandom | head -c (100000000) > file.txt
  1. Generate the signed URL and exchange it for the upload location URL.

The java and golang source for the samples here is shown at the end of the article. For java, you’ll need to crate a service account JSON file while for golang, a .p12 which you will need to convert to PEM. You will also need to grant the service account access to the bucket+object in question.

java
mvn -q clean install exec:java
golang
go run main.go
$ gsutil signurl -c 'text/plain' \
   -m RESUMABLE /path/to/your/json_cert_file.json \
   gs://your_bucket/file.txt
  1. POST initial request Submit an empty POST request with the added HEADER (x-googe-resumable:start) to get the location URL Use the signedURL from the previous step
$ curl -v -X 'POST' \
   -H 'content-type: text/plain' \
   -H 'x-goog-resumable:start'  \
   -d '' '<signedURL>'
> POST <signedURL> HTTP/1.1
> User-Agent: curl/7.35.0
> Host: storage.googleapis.com
> Accept: */*
> content-type: text/plain
> x-goog-resumable:start
> Content-Length: 0
> 
< HTTP/1.1 201 Created
< X-GUploader-UploadID: <redacted>
< Location: <Location_URL>
< Content-Length: 0
< Date: Mon, 11 Sep 2017 02:07:42 GMT
* Server UploadServer is not blacklisted
< Server: UploadServer
< Content-Type: text/html; charset=UTF-8
< Alt-Svc: quic=":443"; ma=2592000; v="39,38,37,35"
< 
  1. Recall Location value Just set the Location value in an environment variable for later use (enclose the value with quotes)
export LOCATION_URL='<location-url-from-response>'
  1. Start upload Now start the upload and interrupt it after maybe 5 or 10 seconds to simulate network failure (i.,e click ^C to interrupt curl). If you do not interrupt it, the upload should work as normal…but we want to show
$ curl -v -X PUT --upload-file file.txt $LOCATION_URL
> PUT <location_url> HTTP/1.1
> User-Agent: curl/7.35.0
> Host: storage.googleapis.com
> Accept: */*
> Content-Length: 100000000
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
^C
  1. Find progress Now find out how much got transferred.

Remember to set the content-range header:

curl -v -X PUT -d '' \
    -H "Content-Range: bytes */100000000"  \
    $LOCATION_URL
> PUT <location_url> HTTP/1.1
> User-Agent: curl/7.35.0
> Host: storage.googleapis.com
> Accept: */*
> Content-Range: bytes */100000000
> Content-Length: 0
> Content-Type: application/x-www-form-urlencoded
> 
< HTTP/1.1 308 Resume Incomplete
< X-GUploader-UploadID: <redacted>
< Range: bytes=0-9699327
< X-Range-MD5: b5c37d023a9d8d111d8848c06a06a070
< Content-Length: 0
< Date: Mon, 11 Sep 2017 02:10:32 GMT
* Server UploadServer is not blacklisted
< Server: UploadServer
< Content-Type: text/html; charset=UTF-8
< Alt-Svc: quic=":443"; ma=2592000; v="39,38,37,35"
  1. Create difference

Create a temp file to transfer with the remaining bytes

Range: bytes=0–9699327

response header shows that we transmitted 9699328 bytes so we have to transmit the reamaining bits…so lets create a file with that starting with the next byte in the file 9699327 +1 = 9699328

dd skip=9699328 if=file.txt of=remainder.txt ibs=1
  1. Upload the remaining
curl -v -X PUT \
 --upload-file remainder.txt \
 -H "Content-Range: bytes 9699328-99999999/100000000" \
 $LOCATION_URL
> PUT <location_url> HTTP/1.1
> User-Agent: curl/7.35.0
> Host: storage.googleapis.com
> Accept: */*
> Content-Range: bytes 9699328-99999999/100000000
> Content-Length: 90300672
> Expect: 100-continue
> 
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 200 OK
< X-GUploader-UploadID: <redacted>
< ETag: "4477044b09dd2e9a6f710b9001d09028"
< x-goog-generation: 1505096052347868
< x-goog-metageneration: 1
< x-goog-hash: crc32c=y8RBkA==
< x-goog-hash: md5=RHcESwndLppvcQuQAdCQKA==
< x-goog-stored-content-length: 100000000
< x-goog-stored-content-encoding: identity
< Content-Length: 0
< Date: Mon, 11 Sep 2017 02:14:12 GMT
  1. Verify partial transfer:
$ gsutil cp  gs://mineral-minutia-820/file.txt downloaded.txt

sha256sum file.txt 
b0a630e52a198c7ce1dfdc5cb0987cec0c9aaac7c3de27ae700c961069778a7c  file.txt
sha256sum downloaded.txt 
b0a630e52a198c7ce1dfdc5cb0987cec0c9aaac7c3de27ae700c961069778a7c  downloaded.txt

Thats it!, we’ve uploaded the file completely by hand.

The remaining is for extra credit and if you want to generate a signedURL with canonical headers.

Appendix

The following code samples in Java and Golang issues a SignedURL with the resumable headers baked into it already.

  • Main.java

Note: SignedURLs issued by google-cloud Java currently does not support setting Canonical Headers (see issue#2000)…which means you have to create a signUrl manually as shown below:

0dc2fb49221f4871dde7582635a5e93a

  • main.go

google-cloud golang does allow for setting the canonical header in the request:

This site supports webmentions. Send me a mention via this form.