New options for BigQuery GIS geospatial data ingestion

Last time I showed how to use a PostgreSQL instance with PostGIS to fix invalid geometries, now I’m glad to introduce new BigQuery native options that simplify loading geospatial data, including invalid geometries.

The new signatures of ST_GeogFromText and ST_GeogFromGeoJson support two common requests

Planar WKT

But what if you have a file with WKT geometries that came from a planar system? You now have an option to correctly ingest this one too:

ST_GeogFromText(wkt, planar => TRUE)

The planar is named parameter, you should pass TRUE value using the named argument syntax above.

Named parameters allow you to specify only the arguments you need. E.g. in ST_GeogFromText(wkt, oriented, planar, make_valid) the first argument is regular required parameter, and the following three are optional named parameters. You only need to pass wkt, and any combination of other arguments. The order of named arguments does not matter either, but regular arguments should be listed first.

Make Valid

BigQuery now supports a slightly different way of fixing the invalid geometries. Rather than using ST_MakeValid function, you need to do it at ingestion time, when converting WKT or GeoJson to a Geography object. ST_GeogFromText and ST_GeogFromGeoJson added new parameter make_valid. Similar to planar, it is a named argument and should be passed as

ST_GeogFromText(wkt, make_valid => TRUE)
ST_GeogFromGeoJson(geojson, make_valid => TRUE)

You can also combine make_valid and planar, e.g.

ST_GeogFromText(wkt, make_valid => TRUE, planar => TRUE)

Note that when faced with invalid geometry, make_valid has to make a guess about what the invalid input was supposed to mean. That’s because a valid geometry has a single interpretation determined by OGC / ISO standard, but invalid one is often ambiguous. The heuristics used to make this guess and fix the geometry may also change in future.

Thus it is always a good idea to manually inspect the result of the conversion and visually examine if BigQuery’s interpretation of the input matches the expectations. A simple workflow for this is to run a query like below to see geometries that cannot be ingested directly and compare with the result of fixing them using make_valid option:

SELECT
geojson AS original_geojson,
ST_AsGeoJson(ST_GeogFromGeoJson(geojson, make_valid => TRUE))
AS make_valid_geojson
FROM my_dataset.my_table
WHERE geojson is NOT NULL AND
SAFE.ST_GeogFromGeoJson(geojson) IS NULL

Here, we are filtering rows where “regular” ST_GeogFromGeoJson fails — WHERE SAFE.ST_GeogFromGeoJson(geojson) IS NULL, then fix the geometry using make_valid => TRUE option. This gives us pairs of original and valid geojson, which we can then examine row by row and verify whether the conversion is OK.

One final word: the make_valid option is currently in beta stage, and we like feedback about anything that worked / did not work. On StackOverflow, BQGIS tags are [google-bigquery] and [gis] (link to ask question), and we have public bug trackers as well (link to open BQGIS bug).
Update: the feature has reached General Availability.

Let’s use it!

First, let’s inspect invalid polygons, and validate fixes:

SELECT 
rand() color, -- to randomly color polygons in GeoViz
the_geom original,
st_geogfromtext(the_geom, planar => TRUE, make_valid => TRUE) geom
FROM `tmp.sfzoning`
where safe.st_geogfromtext(the_geom, planar => TRUE) is null

Note I also used planar => TRUE option. It probably does not matter for San Francisco, as the difference between planar and spherical edges is small at such scale. But since data most likely comes from planar system, it is a good habit to use this option too. I’ve also added rand() column to randomly color the resulting polygons, and be able to distinguish them.

BigQuery GeoViz cannot show invalid geometry, so we’ll use third party tools. Wicket seems to work great here. Let’s examine original data, which looks like

original image

I was also able to find the “spike” that made BigQuery unhappy:

the spike that makes the polygon invalid

The converted geography looks correct in GeoViz:

converted geography

You cannot see it here, but the problematic “spike” was converted to a hole, to make this geography valid. Similarly, you should inspect other converted polygons. Looks good, let’s convert our temporary table to the final one:

CREATE TABLE my_dataset.sfzoning AS
SELECT
* EXCEPT(the_geom),
st_geogfromtext(the_geom, planar => TRUE, make_valid => TRUE) geom
FROM `tmp.sfzoning`

Hi, I'm TL of BigQuery GIS project. Posting small recipes and various notes for BQ GIS users.