Update 11/17/22: its probably easier to define a BQ Remote Function to do this than a BQ UDF in javascript. in this mode, you do the marshalling/unmarshalling in a cloud function similar to BigQuery Remote Functions in Go, AEAD Encryption with BQ Remote Functions
Simple BigQuery User Defined Function (UDF
) that wil parse a well-formed XML bigquery STRING
column into JSON.
The specific library used in this example to parse XML to JSON is xml-js
This is an extension of a StackOverfow post I came across recently. The specific extension is to further parse the output using standard BigQuer JSON functions to extract specific fields.
You can find the source here
this is not officially supported by google
You can either build the UDF or use the one in this repo xml_udf.js
To use the provided one, simply make a copy of it and place it into a GCS bucket in your project
To build, you’ll need nodejs
cd webpack-demo/
npm init -y
npm install webpack webpack-cli --save-dev
npm install --save xml-js
The actual code that does the UDF is at webpack-demo/src/index.ts
var convert = require('xml-js');
module.exports.XML2JSON = function (xml) {
return convert.xml2json(xml, {compact: true, spaces: 4});
}
module.exports.JSON2XML = function (json) {
var options = {compact: true, ignoreComment: true, spaces: 4};
return convert.json2xml(json, options);
}
Then build the webpack
npx webpack --config webpack.config.js
Edit the file xml_udf.js
and add the contents of the webpack
---> contents of webpack-demo/dist/main.js <---
function frmXML(d) {
c = xmllib.XML2JSON(d);
return c;
}
Upload the file and finally, use the UDF
CREATE TEMP FUNCTION myFunc(a STRING)
RETURNS STRING
LANGUAGE js AS
"""
return frmXML(a);
"""
OPTIONS (
library=["gs://mineral-minutia-820/xml_udf.js"]
);
SELECT JSON_EXTRACT(myFunc(a),"$.html.body.li._text") FROM UNNEST(["<html><title>Title of Page</title><body><li>hi sal </li></body></html>"]) as a;
Note the output of the UDF is JSON which i’ve parse out to extract just one field (hi sal
):
You can also use the following XML Parsers in your own UDF:
This sample does not utilize XLST. Trust me, you don’t want to go there…but if you do, see fiduswriter/xslt-processor
This site supports webmentions. Send me a mention via this form.