Extra geography simplification in BigQuery
We had a discussion with a customer about BigQuery’s ST_Simplify
that turned into a helper function I want to share. BigQuery has ST_Simplify
method, but it was not enough — there was a lot of noise remaining after ST_Simplify
, which made rendering of the result too slow.
This is caused by the difference in ST_Simplify
between BigQuery and some other systems like PostGIS, which the customer previously worked with. PostGIS version of ST_Simplify
performs two distinct functions in one call, that both result in simpler shapes:
- it smoothes the edges of each shape using Douglas-Peucker algorithm, creating simpler edges with error up to the given
tolerance
- it drops any resulting shapes smaller than
tolerance
in diameter
In PostGIS the second behavior is controlled by preserveCollapsed
option, which defaults to false
, meaning the function simply drops any resulting individual parts created by ST_Simplify
call that are smaller than tolerance
. BigQuery does not have such option, and always keeps all the shapes. One of the reasons for lack of this option is that BigQuery works on the whole geography, rather than on an object-by-object basis, which makes the semantics of this flag somewhat murky.
OK, lacking this functionality, let’s create it ourselves? Our function will drop any geometry object with diameter smaller than some threshold
- First, we need to split geometry into parts, we’ll use
ST_Dump
for that. - Diameter is the max distance between two points of a shape, so we can use
ST_MaxDistance(g, g)
function, passing same object as first and second argument. - A tiny optimization: since all points (0-dimension objects) have zero diameter, we can simply drop all of them, without even calling this function.
- Finally, we need to merge all the remaining parts together, using
ST_Union_Agg
.
Here is the code, you can use it as temporary function like here, or create a permanent one:
CREATE TEMP FUNCTION ST_DropSmallObjects(
geo GEOGRAPHY,
tolerance FLOAT64) AS
((
WITH parts AS (
-- split into parts
SELECT ST_Dump(geo) g
),
filtered AS (
-- select large enough parts
SELECT g2 FROM parts p, UNNEST(p.g) g2
WHERE ST_Dimension(g2) > 0
AND ST_MaxDistance(g2, g2) > tolerance
)
-- merge them together
SELECT ST_Union_Agg(g2)
FROM filtered
));
You might also decide to adapt this function to your needs. E.g. instead of (or in addition to) using diameter to filter out tiny shapes, you can use shape’s area.