Michael Entin
1 min readFeb 17, 2020

--

The clustering mentioned in this article is storage clustering. I.e. you can create a table clustered by GEOGRAPHY column, and the internal storage will be split into segments using location based clustering:
CREATE TABLE Foo PARTITION BY … CLUSTER BY geog …

Such tables might result in better performance and lower query cost when you have a spatial filter in the query, like WHERE ST_DWithin(geog, ...) or similar.

If you want to cluster in a sense of machine learning algorithm — this is different. For that you might use BigQuery ML k-means model, which supports location based clustering using GEOGRAPHY column, or mixed clustering based on both location and other attributes, https://cloud.google.com/bigquery-ml/docs/kmeans-tutorial

--

--

Michael Entin
Michael Entin

Written by Michael Entin

Hi, I'm TL of BigQuery Geospatial project. Posting small recipes and various notes for BQ Geospatial users.

No responses yet