Contains/Covers/Intersects/Within?
TL;DR: use ST_Intersects
in most cases.
When checking whether something is within something else, which ST_* function do I use? Say you have countries and locations (points), or countries and buildings (described as polygons) and want to find out which country each point or building belongs to. You might write a BigQuery GIS query like
SELECT * FROM countries c, points p
WHERE st_contains(c.geog, p.geog)
But what predicate should you use here? ST_Contains, or ST_Covers, or ST_Intersects? Let’s see what’s the difference between them. Note that precise semantics is somewhat detached from the name, but the function definitions were defined by various standard committees, and BQ GIS tries to follow standards.
To define the precise semantics, we need to know that in regards to a shape, all points can be classified as belonging either to outside of the shape, border of the shape, or inside of the shape. The semantics of the border varies among databases. To avoid floating point errors, BigQuery uses snapping of about 1 micron — all points withint snapping distance are on the border. Without snapping, you might get inconsistent results due to unavoidable floating point errors, e.g. you could get
# select st_intersects(st_geomfromtext('linestring(1 0, 0 1)'),
st_point(1./3, 2./3));
st_intersects
---------------
f
(1 row)
We are now ready to define the functions.
ST_Intersects(a, b)
returns TRUE if there is at least one common point between input geographies (does not matter if it is inside or on the border).
ST_Covers(a, b)
returns TRUE if no points of b
are outside of a
. When b
is a point, the result is identical to ST_Intersects
. When b
is a polygon, it differs from ST_Intersects
in that ST_Covers
return FALSE if b
is partially outside of a
.
ST_Contains(a, b)
returns TRUE if no points of b
are outside of a
, and at least one point of b
is inside.
- When
b
is a point, it differs fromST_Covers
in that it returns FALSE for points on the border. - When
b
is a polygon,ST_Contains
is identical toST_Covers
(exercise for the reader is to check this follows from definition above). Note polygonb
might have points at thea
boundary, but if no points are outside —ST_Contains
return TRUE.
If your expectation of ST_Contains
behavior did not match this description, or you were surprised by difference between points and polygons, you are not alone. PostGIS has ST_ContainsProperly
, with more usable semantics, but this function is not available in BigQuery GIS.
ST_Within(a, b)
is equivalent to ST_Contains(b, a)
- with swapped arguments, so we would not discuss it in details.
Surprisingly, we also see usage ST_DWithin(a, b, 0)
(note this is DWithin, not Within) — an expression equivalent to ST_Distance(a, b) <= 0
i.e. a convoluted way to check that geographies a
and b
have a common point. It is thus equivalent to ST_Intersects(a, b)
but typically slower.
For our case of points within countries — all three functions return the same result, except for extremely rare case of border points, when ST_Contains
disagrees with other two. For buildings within countries — the results are again the same, except for buildings crossing the border, here ST_Intersects
disagrees with others. If you do care about borders, ST_Intersects
is probably most useful: in case of an ambiguity it returns all countries the building (at least partially) belongs to, rather than dropping all partial intersections.
There is one critical distinction though: detecting whether a point is exactly at the border of a complex polygon, or how two polygons relate near their boundaries is very computationally expensive, especially if the polygons are large and complex. ST_Intersects
might be orders of magnitude faster, as it just needs to find some common point or prove there are none. So use ST_Intersects
by default, unless you really need semantics of ST_Covers
or ST_Contains
.
After ST_Intersects
, ST_Covers
is the next one in term of performance, and offers clear and useful semantics. ST_Contains
is slowest in terms of performance, as well as has the somewhat surprising and rarely useful semantics.
Finally, a small cheat sheet for common cases. Here red polygon represents first argument, and black point or polygon is second argument.
- ST_Intersects = true, ST_Covers = true, ST_Contains = true.
- ST_Intersects = true, ST_Covers = true, ST_Contains = true.
- ST_Intersects = true, ST_Covers = true, ST_Contains = false.
- ST_Intersects = true, ST_Covers = true, ST_Contains = true.
- ST_Intersects = false, ST_Covers = false, ST_Contains = false.
- ST_Intersects = true, ST_Covers = false, ST_Contains = false.