Given a zip code, find zip codes immediately bordering it
With BigQuery GIS, this is easy.
First, we need to define what exactly does “bordering” mean? Technically, zip codes are not polygons, but rather mail delivery points and routes. Of course, everybody reduces them to much more convenient polygons. Still, they don’t have to «touch». Let’s start with guesstimate and consider all zip codes within 100 meters of each other as neighbors.
Next, let’s build connection table, by finding all zip codes within 100 meters of each other. Run this query and save the result to a table data.zip_n.
select zip.zipcode zip, neighbors.zipcode neighbor
from `bigquery-public-data.utility_us.zipcode_area` zip,
`bigquery-public-data.utility_us.zipcode_area` neighbors
where st_dwithin(st_geogfromtext(neighbors.zipcode_geom),
st_geogfromtext(zip.zipcode_geom), 100) and
zip.zipcode <> neighbors.zipcode;
With this table we can quickly find all neighbor zip codes with query like
select neighbor from `data.zip_n` where zip = '98033' order by 1
Finally, let’s validate if the 100 meters was a good choice for threshold. Let’s find all pairs that are within 100 meters, but further away from each other than 1 meter:
select zip.zipcode zip, neighbors.zipcode neighbor,
st_geogfromtext(zip.zipcode_geom) my_geo,
st_geogfromtext(neighbors.zipcode_geom) nb_geo
from `bigquery-public-data.utility_us.zipcode_area` zip,
`bigquery-public-data.utility_us.zipcode_area` neighbors
where st_dwithin(st_geogfromtext(neighbors.zipcode_geom),
st_geogfromtext(zip.zipcode_geom), 100) and
zip.zipcode <> neighbors.zipcode and
not st_dwithin(st_geogfromtext(neighbors.zipcode_geom),
st_geogfromtext(zip.zipcode_geom), 1)
Run this query in BigQuery Geo Viz, select ‘my_geo’ as geography visualization column, and examine zip code pairs. You’ll find many pairs like below. It is up to your business logic whether to consider these neighbors and how to choose specific threshold.
Update: there is a new zip codes table now, bigquery-public-data.geo_us_boundaries.zip_codes
which contains zip code polygons as GEOGRAPHY
column (rather than as string). So you don’t need ST_GeogFromText
anymore, and the JOIN query is much faster.