Extra geography simplification in BigQuery
We had a discussion with a customer about BigQuery’s
ST_Simplify that turned into a helper function I want to share. BigQuery has
ST_Simplify method, but it was not enough — there was a lot of noise remaining after
ST_Simplify, which made rendering of the result too slow.
This is caused by the difference in
ST_Simplify between BigQuery and some other systems like PostGIS, which the customer previously worked with. PostGIS version of
ST_Simplify performs two distinct functions in one call, that both result in simpler shapes:
- it smoothes the edges of each shape using Douglas-Peucker algorithm, creating simpler edges with error up to the given
- it drops any resulting shapes smaller than
In PostGIS the second behavior is controlled by
preserveCollapsed option, which defaults to
false, meaning the function simply drops any resulting individual parts created by
ST_Simplify call that are smaller than
tolerance. BigQuery does not have such option, and always keeps all the shapes. One of the reasons for lack of this option is that BigQuery works on the whole geography, rather than on an object-by-object basis, which makes the semantics of this flag somewhat murky.
OK, lacking this functionality, let’s create it ourselves? Our function will drop any geometry object with diameter smaller than some threshold
- First, we need to split geometry into parts, we’ll use
- Diameter is the max distance between two points of a shape, so we can use
ST_MaxDistance(g, g)function, passing same object as first and second argument.
- A tiny optimization: since all points (0-dimension objects) have zero diameter, we can simply drop all of them, without even calling this function.
- Finally, we need to merge all the remaining parts together, using
Here is the code, you can use it as temporary function like here, or create a permanent one:
CREATE TEMP FUNCTION ST_DropSmallObjects(
tolerance FLOAT64) AS
WITH parts AS (
-- split into parts
SELECT ST_Dump(geo) g
filtered AS (
-- select large enough parts
SELECT g2 FROM parts p, UNNEST(p.g) g2
WHERE ST_Dimension(g2) > 0
AND ST_MaxDistance(g2, g2) > tolerance
-- merge them together
You might also decide to adapt this function to your needs. E.g. instead of (or in addition to) using diameter to filter out tiny shapes, you can use shape’s area.