Nearest neighbor using BQ Scripting

Michael Entin
2 min readDec 15, 2019

We’ve already discussed Nearest Neighbors problem:

The general idea is to start with small search radius, and increase it if we fail to find anything. Since that article, we got BigQuery Scripting, so let’s use it to do this. This time we’ll solve single neighbor search — find nearest neighbor for a specific point, rather than finding neighbors for a set of points.

For demo purposes, we’ll search nearest road in US Tiger database. Note it is not a very good road dataset for actual use, but fine for demo. The search with a specific radius will look like this:

SELECT *
FROM `bigquery-public-data.geo_us_roads.us_national_roads`
WHERE ST_DWithin(center, road_geom, search_r)
ORDER BY ST_Distance(center, road_geom) DESC LIMIT 1

Where search_r is search radius, and center is the point for which we are trying to find a neighbor. We’ll start with search radius of 100 m and increase it 10x each time if we don’t find anything with current radius. The procedure will look like this:

CREATE OR REPLACE PROCEDURE demo.NearestRoad(
center GEOGRAPHY,
OUT road_id STRING,
OUT road_geom GEOGRAPHY)
BEGIN
DECLARE search_r FLOAT64; -- current search radius
DECLARE result STRUCT<road_id STRING, road_geom GEOGRAPHY>;
SET search_r = 100;
LOOP
SET result = (
SELECT STRUCT(road_id, road_geom)
FROM `bigquery-public-data.geo_us_roads.us_national_roads`
WHERE ST_DWithin(center, road_geom, search_r)
ORDER BY ST_Distance(center, road_geom) DESC LIMIT 1
);
IF result IS NOT NULL THEN
SET road_id = result.road_id;
SET road_geom = result.road_geom;
LEAVE;
END IF;
SET search_r = search_r * 10;
END LOOP;
END;

You can call it like this:

DECLARE id STRING;
DECLARE geom GEOGRAPHY;
CALL demo.NearestNeighbor(ST_GeogPoint(-122.191667, 47.685833),
id, geom);

--

--

Michael Entin

Hi, I'm TL of BigQuery Geospatial project. Posting small recipes and various notes for BQ Geospatial users.