Today’s story is inspired by a feature request opened for BigQuery GIS. The request, as I understand it, is to provide aggregate version of ST_Intersect or ST_Intersection. BigQuery GIS has ST_Union_Agg, but no ST_Intersection_Agg. Frankly, I still wonder about a use case when this would be needed, but let’s create such a function anyway.

The reason this is interesting is we’ll use complementary polygons here. Usually they cause troubles when a polygon with incorrect orientation is loaded to BigQuery, but here we’ll use them to our advantage.

Short refresher: on a sphere, every polygon has a complementary polygon. For example…

Today’s note is not specific for BQ GIS, but is handy when doing geospatial or other types of visualization and the dimension you are interested in has big outliers. Such extreme values might make charts ugly, or squeeze the choropleth map’s effective color range.

As an example, you might have most values in 0 to 100 range, and a few values reach 1000. Sometimes these outliers are interesting, but in my case they were noise, and I wanted to ignore them. I used to adjust the chart’s range manually or add another color interval, but decided to automate it. This…

Last time I showed how to use a PostgreSQL instance with PostGIS to fix invalid geometries, now I’m glad to introduce new BigQuery native options that simplify loading geospatial data, including invalid geometries.

The new signatures of ST_GeogFromText and ST_GeogFromGeoJson support two common requests

Planar WKT

If you worked with BigQuery GIS, you know it treats WKT as describing an object on spherical Earth, with geodesic edges, matching semantics of internal GEOGRAPHY type. And if you had planar data, loading it using WKT format sometimes is OK, but could lead to problems if the geometries are spatially large. …

Today we’ll have a complex script that ties together new features of BigQuery to transparently talk from BigQuery Script to a PostgreSQL instance. With Dynamic SQL we can build BigQuery SQL queries dynamically, and here we’ll build and execute PostgreSQL queries dynamically in BigQuery.

A practical motivating example here is being able to fix invalid polygons that don’t conform to OGC spec and cannot be loaded to BigQuery. Unfortunately, such bad data can be found in various datasets one has to work with. You can often fix it using ST_MakeValid function in PostgreSQL, but there is no such facility yet…

You might have already seen many ways to convert GeoJson files to something BigQuery can understand. Let’s invent one more wheel!

E.g. this python script from Lak Lakshmanan:

Or using org2org tools to produce CSV file:

I also have a few node.js scripts to convert both ways between GeoJson or new-line-delimited GeoJson to New-line-delimited Json (ndjson) that BigQuery uses at

But not everyone knows how to run Python and installing Node.js might be a mess, so here is a simpler way to do it, using jq command line tool. Jq ( is single-executable command line tool for processing json…

Today’s recipe discusses the following problem: you have some geospatial statistics collected for one set of spatial hierarchy, but you need to do some data analysis using another hierarchy. Say we have very detailed US census statistics, collected for census tracts, but we need statistics for zip codes, and we don’t have any.

Now a huge disclaimer: you should NOT use zip codes for your geospatial analysis, they are bad for that purpose. See this excellent article in Carto blog that demonstrates well why this is bad:

But well, sometimes you need to do something even if you should…

Today’s short notice is not about GIS, just about BigQuery. Let’s fix a reliability bug!

I’ve just noticed the following pattern used in various posts, to convert any column value to a small integer number in a fixed range. It is used for partitioning, random sampling, etc:


The idea is to compute hash, and then bring it to a fixed range we need, in this case [0, 10).

The FARM_FINGERPRINT returns a 64-bit signed integer, we compute it’s absolute value to avoid issues with negative values, and then take modulo 10 to bring it to the range we…

We’ve already discussed Nearest Neighbors problem:

The general idea is to start with small search radius, and increase it if we fail to find anything. Since that article, we got BigQuery Scripting, so let’s use it to do this. This time we’ll solve single neighbor search — find nearest neighbor for a specific point, rather than finding neighbors for a set of points.

For demo purposes, we’ll search nearest road in US Tiger database. Note it is not a very good road dataset for actual use, but fine for demo. The search with a specific radius will look like this:

Let’s try to repeat the following analysis in BigQuery and see how easy it is to do geospatial analysis in BigQuery GIS and BigQuery public datasets.

We’ll follow the general path of this article, with modifications as needed.

First, we’ll skip all the installation steps, nothing needs to be installed! OK, I’m cheating a bit here — if you want to use bq command line to upload data, you need to install Google Cloud SDK. If you use UI — you can skip it too.

Second, the article describes loading shape file. We can do this — see but…

Today we’ll have a small puzzle, why does a simple albeit long GeoJson line between two points looks curved in BigQuery GeoViz? Here is the line —

{"type": "LineString", "coordinates": [[-80,0], [80,80]]}

Here is how it appears in GeoViz:

Line in BigQuery GeoViz

If you draw the same line in other tools, you typically get a line that looks straight, and more importantly covers different area, crossing UK instead of Poland. E.g. in

Michael Entin

Hi, I'm TL of BigQuery GIS project. Posting small recipes and various notes for BQ GIS users.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store