BigQuery geospatial visualization news
If you looked at BigQuery native visualization options some time ago, you might have tried BigQuery GeoViz and found it limited. But a lot of things have changed over the time, and I want to summarize major things that might help you to visualize Geospatial data in BigQuery.
More data
First, BigQuery GeoViz can handle much more data now. It is still a client-side app and is limited by the client memory, but can handle up to about a quarter gigabyte of data. This is not huge, but it allows it to show e.g. choropleth map of all US counties. Let’s use NOAA lightning strikes dataset to show it, coloring counties by the number of strikes per square km.
SELECT
geo_id,
sum(number_of_strikes) as number_of_strikes,
any_value(county_name) as county_name,
any_value(county_geom) as county_geom,
sum(number_of_strikes) * 1e6 /
any_value(area_land_meters+area_water_meters) as strikes_per_km2
FROM `bigquery-public-data.noaa_lightning.lightning_2018` l
CROSS JOIN `bigquery-public-data.geo_us_boundaries.counties` c
WHERE ST_INTERSECTS(l.center_point, c.county_geom)
GROUP BY geo_id
This performs geospatial join of ~4mln records of lightning strikes with ~3k USA county polygons, finishes in ~20 seconds, let’s draw it. Here is the styling I used:
to render this picture:
ST_Simplify
This is not a GeoViz update, but a new BigQuery function that is very useful here. By reducing geography shape details, it allows GeoViz to show even more data and improves its responsiveness. E.g. while GeoViz can render all US counties in the query above, it might get sluggish on a not so powerful machine. We don’t need full spatial resolution to render this image though — the simplified shapes should be enough. Let’s measure how much “savings” can we get from this simplification:
SELECT
sum(length(st_asgeojson(county_geom))) orig,
sum(length(st_asgeojson(st_simplify(county_geom, 10)))) s10,
sum(length(st_asgeojson(st_simplify(county_geom, 100)))) s100,
sum(length(st_asgeojson(st_simplify(county_geom, 1000)))) s1k,
sum(length(st_asgeojson(st_simplify(county_geom, 10000)))) s10k
FROM `bigquery-public-data.geo_us_boundaries.counties`
This produces following values
orig = 198,832,836 -- almost 200 MB of original data
s10 = 36,180,791 -- polygons simplified to 10 meters tolerance
s100 = 9,215,490 -- 100 meters tolerance
s1k = 2,316,429 -- 1km tolerance
s10k = 1,218,814 -- 10km tolerance
Using 10 km is probably not worth it, but 100 meters looks like a sweet spot, reducing data by factor 20x. With less data to read from BigQuery, GeoViz loads data faster and feels more responsive when you explore the map.
Hint: ST_Simplify
is a relatively computationally expensive function, so don’t put it in a query that you run often, or when you care how fast it is. Rather, create a separate table containing simplified version of the Geography, with a query like the one below, and use it whenever you don’t need high precision of the original data:
CREATE OR REPLACE TABLE views.counties_simplified AS
SELECT *,
ST_Simplify(county_geom, 100) AS county_geom_100m,
ST_Simplify(county_geom, 1000) AS county_geom_1km
FROM `bigquery-public-data.geo_us_boundaries.counties`
Data Studio
DataStudio has become much better at visualizing geospatial data — it can draw BigQuery results, here is a write up about showing points
And this article talks about showing arbitrary polygons:
Note that the amount of data the DataStudio can show is about the same as GeoViz or slightly less, so ST_Simplify
can be useful here as well.
Carto tiler
Carto BigQuery tiler is a great third-party solution for visualizing massive spatial datasets when the BigQuery GeoViz limits are not enough.
dekart.xyz
This is another great open-source 3rd party tool that supports visualizing BigQuery GEOGRAPHY data.