New options for BigQuery GIS geospatial data ingestion
Last time I showed how to use a PostgreSQL instance with PostGIS to fix invalid geometries, now I’m glad to introduce new BigQuery native options that simplify loading geospatial data, including invalid geometries.
The new signatures of ST_GeogFromText
and ST_GeogFromGeoJson
support two common requests
Planar WKT
If you worked with BigQuery GIS, you know it treats WKT
as describing an object on spherical Earth, with geodesic edges, matching semantics of internal GEOGRAPHY
type. And if you had planar data, loading it using WKT
format sometimes is OK, but could lead to problems if the geometries are spatially large. For reliable translation you had to ingest planar data usingGeoJson
format, which BigQuery converts from planar model to spherical one.
But what if you have a file with WKT geometries that came from a planar system? You now have an option to correctly ingest this one too:
ST_GeogFromText(wkt, planar => TRUE)
The planar
is named parameter, you should pass TRUE
value using the named argument syntax above.
Named parameters allow you to specify only the arguments you need. E.g. in
ST_GeogFromText(wkt, oriented, planar, make_valid)
the first argument is regular required parameter, and the following three are optional named parameters. You only need to passwkt
, and any combination of other arguments. The order of named arguments does not matter either, but regular arguments should be listed first.
Make Valid
BigQuery does not have ST_MakeValid
function, because we’ve decided to only support valid geometry in internal representation. This avoids many problems and unexpected query results after accidentally loading bad data and then trying to use invalid values. If you have an invalid value, BigQuery prefers to tell you about it upfront.
BigQuery now supports a slightly different way of fixing the invalid geometries. Rather than using ST_MakeValid
function, you need to do it at ingestion time, when converting WKT
or GeoJson
to a Geography
object. ST_GeogFromText
and ST_GeogFromGeoJson
added new parameter make_valid
. Similar to planar
, it is a named argument and should be passed as
ST_GeogFromText(wkt, make_valid => TRUE)
ST_GeogFromGeoJson(geojson, make_valid => TRUE)
You can also combine make_valid
and planar
, e.g.
ST_GeogFromText(wkt, make_valid => TRUE, planar => TRUE)
Note that when faced with invalid geometry, make_valid has to make a guess about what the invalid input was supposed to mean. That’s because a valid geometry has a single interpretation determined by OGC / ISO standard, but invalid one is often ambiguous. The heuristics used to make this guess and fix the geometry may also change in future.
Thus it is always a good idea to manually inspect the result of the conversion and visually examine if BigQuery’s interpretation of the input matches the expectations. A simple workflow for this is to run a query like below to see geometries that cannot be ingested directly and compare with the result of fixing them using make_valid
option:
SELECT
geojson AS original_geojson,
ST_AsGeoJson(ST_GeogFromGeoJson(geojson, make_valid => TRUE))
AS make_valid_geojson
FROM my_dataset.my_table
WHERE geojson is NOT NULL AND
SAFE.ST_GeogFromGeoJson(geojson) IS NULL
Here, we are filtering rows where “regular” ST_GeogFromGeoJson
fails — WHERE SAFE.ST_GeogFromGeoJson(geojson) IS NULL
, then fix the geometry using make_valid => TRUE
option. This gives us pairs of original and valid geojson, which we can then examine row by row and verify whether the conversion is OK.
One final word: the make_valid
option is currently in beta stage, and we like feedback about anything that worked / did not work. On StackOverflow, BQGIS tags are [google-bigquery]
and [gis]
(link to ask question), and we have public bug trackers as well (link to open BQGIS bug).
Update: the feature has reached General Availability.
Let’s use it!
Let’s repeat our last exercise of uploading San Francisco zoning dataset. Again, we download it as CSV file, upload to BigQuery with automatic schema detection (I used a table named tmp.sfzoning
). This gives us a table with column the_geom
of type STRING.
We could not ingest it last time, as a few geometries are not valid there. But now it is easy.
First, let’s inspect invalid polygons, and validate fixes:
SELECT
rand() color, -- to randomly color polygons in GeoViz
the_geom original,
st_geogfromtext(the_geom, planar => TRUE, make_valid => TRUE) geom
FROM `tmp.sfzoning`
where safe.st_geogfromtext(the_geom, planar => TRUE) is null
Note I also used planar => TRUE
option. It probably does not matter for San Francisco, as the difference between planar and spherical edges is small at such scale. But since data most likely comes from planar system, it is a good habit to use this option too. I’ve also added rand()
column to randomly color the resulting polygons, and be able to distinguish them.
BigQuery GeoViz cannot show invalid geometry, so we’ll use third party tools. Wicket seems to work great here. Let’s examine original data, which looks like
I was also able to find the “spike” that made BigQuery unhappy:
The converted geography looks correct in GeoViz:
You cannot see it here, but the problematic “spike” was converted to a hole, to make this geography valid. Similarly, you should inspect other converted polygons. Looks good, let’s convert our temporary table to the final one:
CREATE TABLE my_dataset.sfzoning AS
SELECT
* EXCEPT(the_geom),
st_geogfromtext(the_geom, planar => TRUE, make_valid => TRUE) geom
FROM `tmp.sfzoning`