If you looked at BigQuery native visualization options some time ago, you might have tried BigQuery GeoViz and found it limited. But a lot of things have changed over the time, and I want to summarize major things that might help you to visualize Geospatial data in BigQuery.
First, BigQuery GeoViz can handle much more data now. It is still a client-side app and is limited by the client memory, but can handle up to about a quarter gigabyte of data. This is not huge, but it allows it to show e.g. choropleth map of all US counties. Let’s use NOAA lightning strikes dataset to show it, coloring counties by the number of strikes per square km.
sum(number_of_strikes) as number_of_strikes,
any_value(county_name) as county_name,
any_value(county_geom) as county_geom,
sum(number_of_strikes) * 1e6 /
any_value(area_land_meters+area_water_meters) as strikes_per_km2
FROM `bigquery-public-data.noaa_lightning.lightning_2018` l
CROSS JOIN `bigquery-public-data.geo_us_boundaries.counties` c
WHERE ST_INTERSECTS(l.center_point, c.county_geom)
GROUP BY geo_id
This performs geospatial join of ~4mln records of lightning strikes with ~3k USA county polygons, finishes in ~20 seconds, let’s draw it. Here is the styling I used:
to render this picture:
This is not a GeoViz update, but a new BigQuery function that is very useful here. By reducing geography shape details, it allows GeoViz to show even more data and improves its responsiveness. E.g. while GeoViz can render all US counties in the query above, it might get sluggish on a not so powerful machine. We don’t need full spatial resolution to render this image though — the simplified shapes should be enough. Let’s measure how much “savings” can we get from this simplification:
sum(length(st_asgeojson(st_simplify(county_geom, 10)))) s10,
sum(length(st_asgeojson(st_simplify(county_geom, 100)))) s100,
sum(length(st_asgeojson(st_simplify(county_geom, 1000)))) s1k,
sum(length(st_asgeojson(st_simplify(county_geom, 10000)))) s10k
This produces following values
orig = 198,832,836 -- almost 200 MB of original data
s10 = 36,180,791 -- polygons simplified to 10 meters tolerance
s100 = 9,215,490 -- 100 meters tolerance
s1k = 2,316,429 -- 1km tolerance
s10k = 1,218,814 -- 10km tolerance
Using 10 km is probably not worth it, but 100 meters looks like a sweet spot, reducing data by factor 20x. With less data to read from BigQuery, GeoViz loads data faster and feels more responsive when you explore the map.
ST_Simplify is a relatively computationally expensive function, so don’t put it in a query that you run often, or when you care how fast it is. Rather, create a separate table containing simplified version of the Geography, with a query like the one below, and use it whenever you don’t need high precision of the original data:
CREATE OR REPLACE TABLE views.counties_simplified AS
ST_Simplify(county_geom, 100) AS county_geom_100m,
ST_Simplify(county_geom, 1000) AS county_geom_1km
DataStudio has become much better at visualizing geospatial data — it can draw BigQuery results, here is a write up about showing points
Displaying BigQuery results on Google Maps using Data Studio
Data Studio has a Google Maps layer
And this article talks about showing arbitrary polygons:
Creating choropleth maps in Data Studio and BigQuery GIS | Google Cloud Blog
we’re excited to announce new enhancements to DataStudio, including support for choropleth maps of BigQuery GEOGRAPHY polygons
Note that the amount of data the DataStudio can show is about the same as GeoViz or slightly less, so
ST_Simplify can be useful here as well.
Carto BigQuery tiler is a great third-party solution for visualizing massive spatial datasets when the BigQuery GeoViz limits are not enough.
BigQuery Tiler for Big Data Visualization & Maps - CARTO
Discover BigQuery Tiler, CARTO's solution to visualize Big Data straight out of Google BigQuery.
This is another great open-source 3rd party tool that supports visualizing BigQuery GEOGRAPHY data.