New options for BigQuery GIS geospatial data ingestion

Michael Entin
4 min readOct 15, 2020

--

Last time I showed how to use a PostgreSQL instance with PostGIS to fix invalid geometries, now I’m glad to introduce new BigQuery native options that simplify loading geospatial data, including invalid geometries.

The new signatures of ST_GeogFromText and ST_GeogFromGeoJson support two common requests

Planar WKT

If you worked with BigQuery GIS, you know it treats WKT as describing an object on spherical Earth, with geodesic edges, matching semantics of internal GEOGRAPHY type. And if you had planar data, loading it using WKT format sometimes is OK, but could lead to problems if the geometries are spatially large. For reliable translation you had to ingest planar data usingGeoJson format, which BigQuery converts from planar model to spherical one.

But what if you have a file with WKT geometries that came from a planar system? You now have an option to correctly ingest this one too:

ST_GeogFromText(wkt, planar => TRUE)

The planar is named parameter, you should pass TRUE value using the named argument syntax above.

Named parameters allow you to specify only the arguments you need. E.g. in ST_GeogFromText(wkt, oriented, planar, make_valid) the first argument is regular required parameter, and the following three are optional named parameters. You only need to pass wkt, and any combination of other arguments. The order of named arguments does not matter either, but regular arguments should be listed first.

Make Valid

BigQuery does not have ST_MakeValid function, because we’ve decided to only support valid geometry in internal representation. This avoids many problems and unexpected query results after accidentally loading bad data and then trying to use invalid values. If you have an invalid value, BigQuery prefers to tell you about it upfront.

BigQuery now supports a slightly different way of fixing the invalid geometries. Rather than using ST_MakeValid function, you need to do it at ingestion time, when converting WKT or GeoJson to a Geography object. ST_GeogFromText and ST_GeogFromGeoJson added new parameter make_valid. Similar to planar, it is a named argument and should be passed as

ST_GeogFromText(wkt, make_valid => TRUE)
ST_GeogFromGeoJson(geojson, make_valid => TRUE)

You can also combine make_valid and planar, e.g.

ST_GeogFromText(wkt, make_valid => TRUE, planar => TRUE)

Note that when faced with invalid geometry, make_valid has to make a guess about what the invalid input was supposed to mean. That’s because a valid geometry has a single interpretation determined by OGC / ISO standard, but invalid one is often ambiguous. The heuristics used to make this guess and fix the geometry may also change in future.

Thus it is always a good idea to manually inspect the result of the conversion and visually examine if BigQuery’s interpretation of the input matches the expectations. A simple workflow for this is to run a query like below to see geometries that cannot be ingested directly and compare with the result of fixing them using make_valid option:

SELECT
geojson AS original_geojson,
ST_AsGeoJson(ST_GeogFromGeoJson(geojson, make_valid => TRUE))
AS make_valid_geojson
FROM my_dataset.my_table
WHERE geojson is NOT NULL AND
SAFE.ST_GeogFromGeoJson(geojson) IS NULL

Here, we are filtering rows where “regular” ST_GeogFromGeoJson fails — WHERE SAFE.ST_GeogFromGeoJson(geojson) IS NULL, then fix the geometry using make_valid => TRUE option. This gives us pairs of original and valid geojson, which we can then examine row by row and verify whether the conversion is OK.

One final word: the make_valid option is currently in beta stage, and we like feedback about anything that worked / did not work. On StackOverflow, BQGIS tags are [google-bigquery] and [gis] (link to ask question), and we have public bug trackers as well (link to open BQGIS bug).
Update: the feature has reached General Availability.

Let’s use it!

Let’s repeat our last exercise of uploading San Francisco zoning dataset. Again, we download it as CSV file, upload to BigQuery with automatic schema detection (I used a table named tmp.sfzoning). This gives us a table with column the_geom of type STRING. We could not ingest it last time, as a few geometries are not valid there. But now it is easy.

First, let’s inspect invalid polygons, and validate fixes:

SELECT 
rand() color, -- to randomly color polygons in GeoViz
the_geom original,
st_geogfromtext(the_geom, planar => TRUE, make_valid => TRUE) geom
FROM `tmp.sfzoning`
where safe.st_geogfromtext(the_geom, planar => TRUE) is null

Note I also used planar => TRUE option. It probably does not matter for San Francisco, as the difference between planar and spherical edges is small at such scale. But since data most likely comes from planar system, it is a good habit to use this option too. I’ve also added rand() column to randomly color the resulting polygons, and be able to distinguish them.

BigQuery GeoViz cannot show invalid geometry, so we’ll use third party tools. Wicket seems to work great here. Let’s examine original data, which looks like

original image

I was also able to find the “spike” that made BigQuery unhappy:

the spike that makes the polygon invalid

The converted geography looks correct in GeoViz:

converted geography

You cannot see it here, but the problematic “spike” was converted to a hole, to make this geography valid. Similarly, you should inspect other converted polygons. Looks good, let’s convert our temporary table to the final one:

CREATE TABLE my_dataset.sfzoning AS
SELECT
* EXCEPT(the_geom),
st_geogfromtext(the_geom, planar => TRUE, make_valid => TRUE) geom
FROM `tmp.sfzoning`

--

--

Michael Entin

Hi, I'm TL of BigQuery Geospatial project. Posting small recipes and various notes for BQ Geospatial users.