Extra geography simplification in BigQuery

Michael Entin
2 min readApr 19, 2022

We had a discussion with a customer about BigQuery’s ST_Simplify that turned into a helper function I want to share. BigQuery has ST_Simplify method, but it was not enough — there was a lot of noise remaining after ST_Simplify, which made rendering of the result too slow.

This is caused by the difference in ST_Simplify between BigQuery and some other systems like PostGIS, which the customer previously worked with. PostGIS version of ST_Simplify performs two distinct functions in one call, that both result in simpler shapes:

  • it smoothes the edges of each shape using Douglas-Peucker algorithm, creating simpler edges with error up to the given tolerance
  • it drops any resulting shapes smaller than tolerance in diameter

In PostGIS the second behavior is controlled by preserveCollapsed option, which defaults to false, meaning the function simply drops any resulting individual parts created by ST_Simplify call that are smaller than tolerance. BigQuery does not have such option, and always keeps all the shapes. One of the reasons for lack of this option is that BigQuery works on the whole geography, rather than on an object-by-object basis, which makes the semantics of this flag somewhat murky.

OK, lacking this functionality, let’s create it ourselves? Our function will drop any geometry object with diameter smaller than some threshold

  • First, we need to split geometry into parts, we’ll use ST_Dump for that.
  • Diameter is the max distance between two points of a shape, so we can use ST_MaxDistance(g, g) function, passing same object as first and second argument.
  • A tiny optimization: since all points (0-dimension objects) have zero diameter, we can simply drop all of them, without even calling this function.
  • Finally, we need to merge all the remaining parts together, using ST_Union_Agg.

Here is the code, you can use it as temporary function like here, or create a permanent one:

CREATE TEMP FUNCTION ST_DropSmallObjects(
geo GEOGRAPHY,
tolerance FLOAT64) AS
((
WITH parts AS (
-- split into parts
SELECT ST_Dump(geo) g
),
filtered AS (
-- select large enough parts
SELECT g2 FROM parts p, UNNEST(p.g) g2
WHERE ST_Dimension(g2) > 0
AND ST_MaxDistance(g2, g2) > tolerance
)
-- merge them together
SELECT ST_Union_Agg(g2)
FROM filtered
));

You might also decide to adapt this function to your needs. E.g. instead of (or in addition to) using diameter to filter out tiny shapes, you can use shape’s area.

--

--

Michael Entin

Hi, I'm TL of BigQuery Geospatial project. Posting small recipes and various notes for BQ Geospatial users.