Clustering is more about filtering data.

May 30, 2023

Clustering is more about filtering data. It distributes large table (more than around 0.1 to 1GB) into shards with similar values of clustering column. BigQuery also computes spatial extend of each shard, and when you query such a table with a spatial filter - distant shards can be eliminated based on this metadata alone, without reading data. This saves query cost and improves performance. It currently does not help much with point in polygon queries, unless the query filters some data.

How big is your point cloud, and how do you aggregate it?

Written by Michael Entin

Responses (1)