Did my query get optimized spatial join?

In BQ GIS performance tips I mentioned sometimes you migth get optimized spatial join and sometimes a regular one. Spatial join is implemented for a limited set of query shapes right now, e.g. we don’t have LEFT / RIGHT / FULL OUTER support yet. If you have both equality condition and spatial condition, we chose to run equality JOIN and then check spatial condition — not benefitting from optimized spatial JOIN. Another case: spatial join on ST_DWithin condition is only implemented for a constant distance parameter.

How would you check if specific query used spatial join or not? The most reliable answer is to check ‘Execution details’ tab after runing the query.

I’ll use some nonsense query as an example, as the specific query does not matter: find all geolite records within 1 km of each other:

select a.geoname_id
from `bigquery-public-data.geolite2.ipv4_city_blocks` a
join `bigquery-public-data.geolite2.ipv4_city_blocks` b
on st_distance(a.center_point, b.center_point) < 1000
and a.geoname_id <> b.geoname_id;

Once we run it, let’s expand ‘Execution details’ tab, find the Join stage and expand it. You’ll see a single line with CROSS JOIN and spatial condition. This shows this query did indeed use optimized spatial join. You might also notice BigQuery rewrote condition st_distance(g1, g2) < 1000 into more efficient version st_dwithin(g1, g2, 1000).

Optimized spatial join

Let’s change something to prevent BQ from using optimized join, e.g. use variable distance parameter:

select a.geoname_id
from `bigquery-public-data.geolite2.ipv4_city_blocks` a
join `bigquery-public-data.geolite2.ipv4_city_blocks` b
on st_distance(a.center_point, b.center_point)
< if (a.is_anonymous_proxy, 50, 100)
and a.geoname_id <> b.geoname_id;

Now run the query again. I actually stopped the query few minutes later, it was just taking too long, so all times are 0 ms. The plan is now different, we have plain CROSS JOIN without any predicate and a separate FILTER operator checking spatial condition. This shows we got unoptimized join this time.

Unoptimized spatial join

Hi, I'm TL of BigQuery GIS project. Posting small recipes and various notes for BQ GIS users.