Limits of planar -> spherical edge conversion
When loading planar geometries to the BigQuery, BigQuery converts them to spherical geography objects. One of the major differences between the two is semantics of edges between two points. In a planar geometry, the edge is straight line (in whatever coordinate system the geometry uses) between the two points. In spherical geography, the edge is spherical geodesic line between the two points. Let’s consider one of the effects of this conversion that might not be obvious, it is about relationship between two spatial objects.
We’ll use two objects, described in planar coordinates, one nested inside the other. I’ll use these simple squares (on flat map, but not on Earth surface!), intentionally huge to demonstrate the issue, A:
{
"type": "Polygon",
"coordinates": [[[5,5], [50,5], [50,50], [5,50], [5,5]]]
}
and B:
{
"type": "Polygon",
"coordinates": [[[30,30], [50,30], [50,50], [30,50], [30,30]]]
}
The second one is clearly nested inside the first one:
Will this relationship be true once we load it in BigQuery?
Let’s test:
with test_data as (
select
st_geogfromgeojson('{"type": "Polygon", "coordinates": [[[5,5], [50,5], [50,50], [5,50], [5,5]]]}') a,
st_geogfromgeojson('{"type": "Polygon", "coordinates": [[[30,30], [50,30], [50,50], [30,50], [30,30]]]}') b
)
select st_covers(a, b) from test_data
The result is false! If this surprised you, keep reading, if not — you might already know the rest of this post.
The reason is tessellation, which BigQuery performs when converting planar geometry shape to BigQuery GIS spherical geography. Consider edge [5,50] to [50,50] - straight line on a flat map. But a geodesic edge between these two points would look like a curve on this flat map, and pass far closer to the north pole. How far? Almost 250km away!
select st_distance(
st_makeline(st_geogpoint(5, 50), st_geogpoint(50, 50)),
st_geogpoint(27.5, 50)) -- result is 246,430
But if we use GeoJson, we get much closer, although not always exact 0:
select
st_distance(g, st_geogpoint(27.5, 50)) a,
st_distance(g, st_geogpoint(27, 50)) b
from (select st_geogfromgeojson(
'{"type": "LineString", "coordinates": [[5,50], [50,50]]}') g)
Result with geojson is a = 0.0, b = 1.94.
To do this, BigQuery inserts new vertices, so that sequence of geodesic lines is close enough (within 10 meters) to the original line on flat map. If you run this query
select st_geogfromgeojson('{"type": "LineString", "coordinates": [[5,50], [50,50]]}')
You get a lot of new points:
LINESTRING(5 50, 5.17578125 50, 5.3515625 50, 5.52734375 50, 5.703125 50, 5.87890625 50, 6.0546875 50, 6.23046875 50, 6.40625 50, 6.58203125 50, 6.7578125 50, 6.93359375 50, 7.109375 50, 7.28515625 50, 7.4609375 50, 7.63671875 50, 7.8125 50, .....
They all have latitude 50, and longitude varying from 5 to 50.
Unfortunately, tessellation of shapes A and B happens independently, and typically the points added to A and to B aren’t the same, but slightly different. This causes the two tessellated lines to intersect, creating some overlap in both directions. Here is an exaggerated image demonstrating what is going on:
How big is this overlap? It should not be too large, as BigQuery ensures tessellated line is within 10 meters of “flat” line.
In this case st_area(st_difference(b, a))
is 503058 m². Given distance from [30,50] to [50,50] of about 1500 km, the average discrepancy is ~ 0.3 m.
The main effect of all this is that due to such imprecision, boolean relationships between geographical objects, like ST_Contains, ST_Covers, etc might change after converting.
Finally, is there a way to preserve the relationship exactly? Yes, but it requires careful preparation of the geometries, and including all common vertices in each shape. For our two shapes, this calls for insertion of vertex [30,50] into the edge [50,50] — [5,50] of shape A. This does not change the “planar version” of shape A, but makes sure its “spherical version” is tessellated exactly the same way as shape B that has vertices [30,50] and [50,50]:
with test_data as (
select
st_geogfromgeojson('{"type": "Polygon", "coordinates": [[[5,5], [50,5], [50,50], [30, 50], [5,50], [5,5]]]}') a,
st_geogfromgeojson('{"type": "Polygon", "coordinates": [[[30,30], [50,30], [50,50], [30,50], [30,30]]]}') b
)
select st_covers(a, b) from test_data
Now the result is true.