Contains/Covers/Intersects/Within?

Michael Entin
4 min readJun 26, 2019

--

TL;DR: use ST_Intersects in most cases.

When checking whether something is within something else, which ST_* function do I use? Say you have countries and locations (points), or countries and buildings (described as polygons) and want to find out which country each point or building belongs to. You might write a BigQuery GIS query like

SELECT * FROM countries c, points p 
WHERE st_contains(c.geog, p.geog)

But what predicate should you use here? ST_Contains, or ST_Covers, or ST_Intersects? Let’s see what’s the difference between them. Note that precise semantics is somewhat detached from the name, but the function definitions were defined by various standard committees, and BQ GIS tries to follow standards.

To define the precise semantics, we need to know that in regards to a shape, all points can be classified as belonging either to outside of the shape, border of the shape, or inside of the shape. The semantics of the border varies among databases. To avoid floating point errors, BigQuery uses snapping of about 1 micron — all points withint snapping distance are on the border. Without snapping, you might get inconsistent results due to unavoidable floating point errors, e.g. you could get

# select st_intersects(st_geomfromtext('linestring(1 0, 0 1)'),
st_point(1./3, 2./3));
st_intersects
---------------
f
(1 row)

We are now ready to define the functions.

ST_Intersects(a, b) returns TRUE if there is at least one common point between input geographies (does not matter if it is inside or on the border).

ST_Covers(a, b) returns TRUE if no points of b are outside of a. When b is a point, the result is identical to ST_Intersects. When b is a polygon, it differs from ST_Intersects in that ST_Covers return FALSE if b is partially outside of a.

ST_Contains(a, b) returns TRUE if no points of b are outside of a, and at least one point of b is inside.

  • When b is a point, it differs from ST_Covers in that it returns FALSE for points on the border.
  • When b is a polygon, ST_Contains is identical to ST_Covers (exercise for the reader is to check this follows from definition above). Note polygon b might have points at the a boundary, but if no points are outside — ST_Contains return TRUE.

If your expectation of ST_Contains behavior did not match this description, or you were surprised by difference between points and polygons, you are not alone. PostGIS has ST_ContainsProperly, with more usable semantics, but this function is not available in BigQuery GIS.

ST_Within(a, b)is equivalent to ST_Contains(b, a)- with swapped arguments, so we would not discuss it in details.

Surprisingly, we also see usage ST_DWithin(a, b, 0) (note this is DWithin, not Within) — an expression equivalent to ST_Distance(a, b) <= 0 i.e. a convoluted way to check that geographies a and b have a common point. It is thus equivalent to ST_Intersects(a, b) but typically slower.

For our case of points within countries — all three functions return the same result, except for extremely rare case of border points, when ST_Contains disagrees with other two. For buildings within countries — the results are again the same, except for buildings crossing the border, here ST_Intersects disagrees with others. If you do care about borders, ST_Intersects is probably most useful: in case of an ambiguity it returns all countries the building (at least partially) belongs to, rather than dropping all partial intersections.

There is one critical distinction though: detecting whether a point is exactly at the border of a complex polygon, or how two polygons relate near their boundaries is very computationally expensive, especially if the polygons are large and complex. ST_Intersects might be orders of magnitude faster, as it just needs to find some common point or prove there are none. So use ST_Intersects by default, unless you really need semantics of ST_Covers or ST_Contains.

After ST_Intersects, ST_Covers is the next one in term of performance, and offers clear and useful semantics. ST_Contains is slowest in terms of performance, as well as has the somewhat surprising and rarely useful semantics.

Finally, a small cheat sheet for common cases. Here red polygon represents first argument, and black point or polygon is second argument.

Six arrangements of points and polygons demonstrating results of ST_* functions.
  1. ST_Intersects = true, ST_Covers = true, ST_Contains = true.
  2. ST_Intersects = true, ST_Covers = true, ST_Contains = true.
  3. ST_Intersects = true, ST_Covers = true, ST_Contains = false.
  4. ST_Intersects = true, ST_Covers = true, ST_Contains = true.
  5. ST_Intersects = false, ST_Covers = false, ST_Contains = false.
  6. ST_Intersects = true, ST_Covers = false, ST_Contains = false.

--

--

Michael Entin

Hi, I'm TL of BigQuery Geospatial project. Posting small recipes and various notes for BQ Geospatial users.