BigQuery GIS performance tips

1. Store Geography objects in the table

We sometimes see users who store spatial objects in the table, but keep them as lat/lon pairs or as WKT / GeoJSON text and construct Geography objects “on the fly” in a query. This prevents many BQGIS optimizations. Instead, add a Geography-typed column to the table (or replace existing columns completely). It has several benefits:

  • it avoids the cost of constructing Geography objects at runtime,
  • it allows BigQuery to use spatial index when querying this table.
SELECT * FROM T WHERE
ST_DWITHIN(geo, ST_GeogPoint(122.3, 47.6), 100)
SELECT * FROM T WHERE 
ST_DWITHIN(ST_GeogPoint(lon, lat), ST_GeogPoint(122.3, 47.6), 100)
SELECT * EXCEPT(lat, lon), ST_GeogPoint(lon, lat) AS geog FROM T
SELECT * EXCEPT(wkt) ST_GeogFromText(wkt) AS geog FROM T

2. Spatially cluster tables

See more about performance and cost benefits of storage clustering in BigQuery: https://cloud.google.com/bigquery/docs/clustered-tables

3. Maybe combine spatial predicates

This one is not a hard rule. Depending on specific data distribution it might help performance or degrade it. Experiment and test both approaches.

SELECT ... FROM T 
WHERE ST_DWITHIN(location, ST_GeogFromText('POINT(123 45)'), 100)
OR ST_DWITHIN(location, ST_GeogFromText('POINT(145 23)'), 100)
OR ...
SELECT ... FROM T 
WHERE ST_DWITHIN(location,
ST_GeogFromText('MULTIPOINT((123 45), (145 23), ...)'),
100)

4. Avoid mixing equality and spatial join conditions

This one is also not a hard rule, but in all occasions I’ve observed it improved query performance.

SELECT ... FROM T1 a JOIN T2 b 
ON CAST(a.lat*100 AS INT64) = CAST(b.lat*100 AS INT64)
AND CAST(a.lon*100 AS INT64) = CAST(b.lon*100 AS INT64)
WHERE ST_DWITHIN(a.geo, b.geo, 10)
  • the lat/lon checks would incorrectly filter out some close neighbors,
  • the lat/lon checks prevents BigQuery from using optimized geospatial join. When both equality and geospatial conditions are present, BigQuery executes query using equality join, with post-join evaluation of spatial condition. Geospatial join is usually faster.
SELECT ... FROM T1 a JOIN T2 b
WHERE ST_DWITHIN(a.geo, b.geo, 10)

--

--

Hi, I'm TL of BigQuery Geospatial project. Posting small recipes and various notes for BQ Geospatial users.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Michael Entin

Michael Entin

151 Followers

Hi, I'm TL of BigQuery Geospatial project. Posting small recipes and various notes for BQ Geospatial users.