Creating spatial hierarchy from imperfect data

Michael Entin
3 min readAug 25, 2019

Last time we observed how planar-to-spherical conversion can lead to a slightly inconsistent conversion of related shapes, and how this can lead to unexpected result of spatial predicates. E.g. starting with flat map geometries a and b satisfying ST_Covers(a, b), when we translate them to spherical geographies a' and b', the expression ST_Covers(a', b') might be false.

Another common case when ST_Covers or ST_Contains gives unexpected result is inconsistent data preparation. E.g. the datasets for states and counties could have been independently prepared by different organizations, or with different precision, and then ST_Covers(state, county) might be false when we expected true, regardless of planar or spherical version.

Let’s discuss some ways to restore spacial hierarchy from fuzzy data.

First, let’s prepare demo data: like in the previous post we’ll use perfect squares (on flat map), but they are distorted when translated to the globe.

We cannot use ST_MakePolygon, since it operates on sphere, let’s create GeoJson version using BigQuery GeoJson translator:

create function demo.geojson_polygon(
points ARRAY<STRUCT<x int64, y int64>>) AS (
ST_GeogFromGeoJson(CONCAT(
'{"type": "Polygon", "coordinates": [[',
ARRAY_TO_STRING(
ARRAY(SELECT FORMAT('[%d, %d]', x, y) FROM UNNEST(points)),
','),
']]}')));

We can now create outer shapes:

create or replace table demo.a AS
SELECT
x + 100 * y AS id,
demo.geojson_polygon([STRUCT(x,y), STRUCT(x+30,y),
STRUCT(x+30,y+30), STRUCT(x,y+30), STRUCT(x,y)]) AS geog
FROM unnest(generate_array(0, 50, 30)) x,
unnest(generate_array(0, 50, 30)) y

This creates four polygons, with an approximate width of 30 degrees in both lon and lat, sitting between 0 and 60 degrees.

demo.b was created by replacing 30 with 10 in the script above to get 36 polygons of width ~10 degrees in the same space.

Let’s check if determining the hierarchy using ST_Covers works:

select count(*) from demo.a a, demo.b b 
where st_covers(a.geog, b.geog) -- 18

Ouch, this missed half of our rows in table b. Bonus points if you can guess which 18 we missed :).

Can we use ST_Intersects instead?

select count(*) from demo.a a, demo.b b
where st_intersects(a.geog, b.geog) -- 64

This returns too much. But some of this is due to nested polygon just touching outer polygon. Let’s exclude these:

select count(*) from demo.a a, demo.b b
where st_intersects(a.geog, b.geog) and
not st_touches(a.geog, b.geog) -- 48

Closer, but we want 36 rows, when each shape in b joins only once.

One idea is to sort potential parents by intersection area and take the largest. ARRAY_AGG with LIMIT clause does the trick:

select 
b.id,
array_agg(a.id
order by st_area(st_intersection(a.geog, b.geog)) desc
limit 1)[offset(0)] as parent_id
,
any_value(b.geog) geog
from demo.a a, demo.b b
where st_intersects(a.geog, b.geog)
group by b.id

After the JOIN, we aggregate by id of b shape, take all the shapes in a intersecting our b, ordering them by intersection area, and take largest one.

If the query above looks too complex, I agree. For simple cases like this, we can take the potential parent if the intersection area is more than half of the child:

select a.id as parent_id, b.id, b.geog from demo.a a, demo.b b
where st_intersects(a.geog, b.geog) and
st_area(st_intersection(a.geog, b.geog)) > 0.5 * st_area(b.geog)

This returns 36 rows. Let’s check them in BQ GeoViz. I’ll draw all b.geog and use parent_id to color them. Perfect now:

One more way to do it is to replace shape b with its centroid, i.e. use condition ST_Intersects(a.geog, ST_Centroid(b.geog)). This works reliably if at least one of the shapes a or b is convex. It might give wrong results if both a and b are concave, but that’s rather rare.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Michael Entin
Michael Entin

Written by Michael Entin

Hi, I'm TL of BigQuery Geospatial project. Posting small recipes and various notes for BQ Geospatial users.

No responses yet