Persistent UDFs + BQ GIS = ♥
BigQuery Persistent UDFs are now beta, you can find great set of examples in Felipe Hoffa post: https://medium.com/@hoffa/new-in-bigquery-persistent-udfs-c9ea4100fd83
Let’s use UDF with GIS too!
E.g. feature request 119665707 asks to provide a way for ST_AsGeoJSON to always produce multipolygon geometry. It is sometimes done with ST_Multi function, but BQ GIS single Geography type does not differentiate collections and single geometries, so there is no ST_Multi. Anyway, it is easy to manipulate JSON with JavaScript, we can do this using UDF:
CREATE FUNCTION demo.STx_MakeMulti(geojson STRING) RETURNS STRING
LANGUAGE js AS '''
var obj = JSON.parse(geojson);
if (obj.type == "Polygon") {
obj.type = "MultiPolygon";
obj.coordinates = [obj.coordinates];
}
return JSON.stringify(obj);
''';
Remember that GeoJson geometry has two properties, type
and coordinates
. If type is ‘Polygon’, we replace it with ‘MultiPolygon’, and nest coordinates in another array (we now have array of polygons, consisting of our single original polygon). Let’s test this function, and that its result is equivalent to original geometry:
SELECT
demo.STx_MakeMulti(ST_AsGeoJson(geo)),
ST_Equals(
ST_GeogFromGeoJson(demo.STx_MakeMulti(ST_AsGeoJson(geo))),
geo)
FROM UNNEST(
[ST_GeogFromText('polygon((1 1, 1 2, 2 2, 1 1))'),
ST_GeogFromText('multipolygon(((1 1, 1 2, 2 2, 1 1)),((10 1, 10 2, 11 2, 10 1)))')]) geo;
Similarly, BQ GIS does not have ST_Dump series of functions yet, so let’s create one.
[Update] BigQuery now has ST_Dump function.
What follows is not a true replacement for ST_Dump as it goes through GeoJSON conversion, but it might be useful way to get data if you interact with systems that don’t support collections:
CREATE OR REPLACE FUNCTION demo.STx_DumpCollection(geojson STRING) RETURNS ARRAY<STRING>
LANGUAGE js AS '''
var obj = JSON.parse(geojson);
if (obj.type == "GeometryCollection") {
return obj.geometries.map(g => JSON.stringify(g));
}
return [geojson];
''';
It might be used to split rows containing geometry collection into multiple rows. Say we have a table sample_table (id INT64, geom GEOGRAPHY)
, we can flatten it using:
SELECT id, geo
FROM sample_table d,
UNNEST(demo.STx_DumpCollection(ST_AsGeoJson(d.geom))) geo
Another common way to split shapes is with WKT and regexp parsing rather than JSON manipulation. Let’s extract all the points:
CREATE OR REPLACE FUNCTION demo.STx_DumpPoints(geog geography)
RETURNS ARRAY<Geography> AS ((
select array (
select ST_GeogPoint(cast(s[offset(0)] as float64),
cast(s[offset(1)] as float64))
from (
select split(p, " ") s
from unnest((
select REGEXP_EXTRACT_ALL(ST_Astext(geog),
r"[-\.0-9e]+ [-\.0-9e]+") pair)) p
))
));
We can now use this to find all the vertices in a shape:
with data as (
select * from unnest(
[st_geogfromtext('multipoint((-1.1 2.2), (3.3 -4.4))'),
st_geogfromtext('linestring(5.5e-5 6, 7 8, 9 10)')]) g
)
select g, p from data, unnest(demo.STx_DumpPoints(g)) p