TL;DR: to load shape files and other geospatial file formats to BigQuery, use the following recipe to create CSV file with GeoJSON-encoded WSG84 geometries:
ogr2ogr -f csv -dialect sqlite -sql
"select AsGeoJSON(ST_Transform(geometry, 4326)) geom, * from shp"
- ogr2ogr is open source GDAL project utility,
- shp — name of the shape file or geojson file (without extension),
- ./input/ — directory containing shape file and related files. For geojson input file, replace it with full name of that file,
- out.csv — output file for loading to BigQuery.
Longer version: if you have GIS data in common GIS formats, like shapefiles or GeoJSON files and want to load it to BigQuery, Lak Lakshmanan has a very good walkthrough:
How to load geographic data like shapefiles into BigQuery
Now that BigQuery has support for GIS queries, let’s look at how we could load in geographic data into BigQuery. I’ll…
[Update] the walkthrough was now updated to use GeoJson field format, I’ll discuss original plan:
- Use ogr2ogr to convert .shp files to CSV with WKT
- Load this CSV file to BigQuery.
But there is a problem with loading geometries as WKT strings, when your data includes large polygons or lines. Here by “large” I mean spatially large, with long edges.
The issue here is the difference between planar edges and geodesic edges. Both shapefiles and GeoJSON files contain planar data with edges that are straight on the flat map. What happens with this data when we ingest it to BigQuery?
- org2ogr converts them to WKT format, it changes the syntax, but does not change the data points,
- BigQuery treats WKT as spherical geography, assuming geodesic edges.
- BigQuery also treats WKT polygons as oriented, which may cause problems with incorrectly oriented polygons.
Thus due to ogr2ogr and BigQuery using different interpretation of WKT format, planar edges are reinterpreted as geodesic ones, and polygons might get inverted. The first issue does not matter for small features, but could be quite significant for large features. The second issue might affect even small features.
We can avoid these issues when ingesting planar geometries to BigQuery by using GeoJson geometries. GeoJSON is defined on planar map, it uses planar edges, and when ingesting GeoJson-encoded geometry fields, BigQuery approximates planar edges with geodesic ones, adding intermediate vertices as needed, providing ~10 meters approximation precision. There is also no orientation ambiguity when loading planar data — so BigQuery can fix orientation if needed, even if the GeoJson field does not follow the standard. BigQuery does not support GeoJson files, but it understands GeoJson field format, which can appear in various file formats, e.g. in CSV files.
How do we make org2ogr produce CSV with GeoJSON geometries? org2ogr has GEOMETRY = AS_WKT option, but does not have GEOMETRY = AS_GEOJSON option. The solution is to use sqlite query within ogr2ogr:
ogr2ogr -f csv -dialect sqlite -sql "select AsGeoJSON(geometry) AS geom, * from test" test.csv test.shp
This produces CSV file with GeoJSON-encoded geometry column. Only this change in org2ogr command is needed. When importing to BigQuery, BigQuery can automatically distinguish between WKT and GeoJSON formats, and when reading GeoJSON it knows the edges are planar and approximates them by geodesic edges.
Finally, if the input shape file uses coordinate system other than WSG84, use sqlite also to transform to WSG84 coordinates used by GeoJSON:
ogr2ogr -f csv -dialect sqlite -sql
"select AsGeoJSON(ST_Transform(geometry, 4326))) AS geom, * from test"
ST_Transform(<geom>, 4326) call takes place of
-t_srs EPSG:4326 option used when converting to CSV + WKT with
-lco geometry=AS_WKT command. If the source file is already in WSG84 coordinate, it is better to omit
ST_Transform to improve performance.
Note that ST_Transform only translates the coordinates of the vertices, and with some projections it might have the same issue that we had with planar versus geodesic edges — the planar edge in one projection might differ from planar edge in another projection, so the translated geometry might not exactly match original one.