Nearest neighbor in BigQuery GIS

Let’s discuss how to find an item from some dataset nearest to each item in another dataset. Say we have set of people locations and set of restaurants, and we want to find nearest restaurant for each person.

BigQuery does not have a dedicated Nearest Neighbor TVF, so well use regular spatial join. We cannot use intersection for this problem, so the only option is to use ST_DWithin condition. This means we’ll have to define some bounding search radius, in the example above this might mean we only search for restaurants closer than 10 km. The smaller the limit, the faster the algorithm runs, but returns more points with no neighbor. Once we found all neighbors within search radius, we need to sort them by distance and take the nearest one.

ARRAY_AGG( ORDER BY ST_Distance(a.geog, b.geog) LIMIT 1)
[ORDINAL(1)] as neighbor_id
FROM people_table a JOIN restaurant_table b
ON ST_DWithin(a.geog, b.geom, 100) -- 100 is search radius

Note this is regular JOIN, so it will drop all points in a for which it does not find any close-enough neighbor. Don’t try to make it LEFT JOIN, BQ GIS doesn’t have spatial optimizations for outer join yet.

Some more hints:

  • If you need more fields from table b, replace expression inside ARRAY_AGG with STRUCT(, b.extra_field, b.one_more_field)
  • If you need more fields from table a, especially in case of GEOGRAPHY that you cannot use as aggregation key, use ANY_VALUE aggregation function:
    SELECT ... , ANY_VALUE(a.geog) ... GROUP BY

If needed, we can then repeat this query with points missed by original search, now using larger search radius, and append it to original result.

WITH missed_people_table AS (
SELECT a.* FROM people_table a
FROM people_with_neighbors n -- result of first SELECT
... -- the same query as above only
-- 1) using missed_people_table
-- 2) larger search radius

Hi, I'm TL of BigQuery GIS project. Posting small recipes and various notes for BQ GIS users.