Contains/Covers/Intersects/Within?
--
TL;DR: use ST_Intersects
in most cases.
When checking whether something is within something else, which ST_* function do I use? Say you have countries and locations (points), or countries and buildings (described as polygons) and want to find out which country each point or building belongs to. You might write a BigQuery GIS query like
SELECT * FROM countries c, points p
WHERE st_contains(c.geog, p.geog)
But what predicate should you use here? ST_Contains, or ST_Covers, or ST_Intersects? Let’s see what’s the difference between them. Note that precise semantics is rather detached from the name, but the function definitions were defined by various standard committees, and BQ GIS tries to follow standards:
ST_Intersects(a, b)
returns TRUE if there is at least one common point between input geographies.
ST_Covers(a, b)
returns TRUE if no points of b
are outside of a
. When b
is a point, the result is identical to ST_Intersects
. When b
is a polygon, it differs from ST_Intersects
in that ST_Covers
return FALSE if b
is partially outside of a
.
ST_Contains(a, b)
returns TRUE if no points of b
are outside of a
, and at least one point of b
is inside.
- When
b
is a point, it returns FALSE for points exactly at the border. Note that exactly means within snapping distance, used to make predicates consistent, which for BigQuery GIS is ~ 1 micron. - When
b
is a polygon,ST_Contains
is identical toST_Covers
(exercise for the reader is to check this follows from definition above). Note polygonb
might have points at thea
boundary, but if no points are outside —ST_Contains
return TRUE.
If your expectation of ST_Contains
behavior did not match this description, or you were surprised by difference between points and polygons, you are not alone. PostGIS has ST_ContainsProperly
, with more usable semantics, but this function is not available in BigQuery GIS.
ST_Within
is equivalent to ST_Contains
with swapped arguments, so we would not discuss it in details.
Surprisingly, we also see usage ST_DWithin(a, b, 0)
(note this is DWithin, not Within) — an expression equivalent to ST_Distance(a, b) <= 0
i.e. a convoluted way to check that geographies a
and b
have a common point. It is thus equivalent to ST_Intersects(a, b)
but typically slower.
For our case of points within countries — all three functions return the same result, except for extremely rare case of border points, when ST_Contains
disagrees with other two. For buildings within countries — the results are again the same, except for buildings crossing the border, here ST_Intersects
disagrees with others. If you do care about borders, ST_Intersects
is probably most useful: in case of a conflict it returns all countries the point or the building (at least partially in later case) belongs to, rather than dropping all partial intersections and border points.
There is one critical distinction though: detecting whether a point is exactly at the border of a complex polygon, or how two polygons relate near their boundaries is very computationally expensive, especially if the polygons are large and complex. ST_Intersects
might be orders of magnitude faster, as it just needs to find some common point or prove there are none. So use ST_Intersects
by default, unless you really need semantics of ST_Covers
or ST_Contains
.
After ST_Intersects
, ST_Covers
is the next one in term of performance, and offers clear and useful semantics. ST_Contains
is slowest in terms of performance, as well as has the somewhat surprising and rarely useful semantics.
Finally, a small cheat sheet for common cases. Here red polygon represents first argument, and black point or polygon is second argument.
- ST_Intersects = true, ST_Covers = true, ST_Contains = true.
- ST_Intersects = true, ST_Covers = true, ST_Contains = true.
- ST_Intersects = true, ST_Covers = true, ST_Contains = false.
- ST_Intersects = true, ST_Covers = true, ST_Contains = true.
- ST_Intersects = false, ST_Covers = false, ST_Contains = false.
- ST_Intersects = true, ST_Covers = false, ST_Contains = false.