# Be careful with ABS function

Today’s short notice is not about GIS, just about BigQuery. Let’s fix a reliability bug!

I’ve just noticed the following pattern used in various posts, to convert any column value to a small integer number in a fixed range. It is used for partitioning, random sampling, etc:

`MOD(ABS(FARM_FINGERPRINT(field)), 10)`

The idea is to compute hash, and then bring it to a fixed range we need, in this case [0, 10).

The `FARM_FINGERPRINT`

returns a 64-bit signed integer, we compute it’s absolute value to avoid issues with negative values, and then take modulo 10 to bring it to the range we need.

Is there an issue here? Well, yes. If `FARM_FINGERPRINT`

result happens to be exactly minimum int64 value, `ABS`

function fails because there is no corresponding positive value, with error `int64 overflow: --9223372036854775808`

. (This is also why BigQuery has `SAFE_NEGATE`

function.) It is not likely to happen, but with big data, even extremely unlikely events do happen some day.

To make this safe, take `MOD`

first and then `ABS`

:

`ABS(MOD(FARM_FINGERPRINT(field), 10))`

You might be concerned about equal probabilities — `MOD`

returns values from -9 to 9; when we take `ABS`

of this, only 0 is converted to 0, while both 1 and -1 are converted to 1, etc. Does this make 0 half as probable? Not really, since when we compute`MOD`

, 0 is twice as probable as other values, because both positive and negative numbers divisible by 10 produce `MOD`

== 0. We can check quality of distribution with the following test:

`with random_data as (`

select cast(RAND() as string) v

from unnest(GENERATE_ARRAY(0, 1000000)) x

)

select

abs(mod(farm_fingerprint(v), 10)) k, count(*) c

from random_data

group by k order by k;

In other cases where you need `ABS`

but don’t have `MOD`

, you can use e.g. expression `IFNULL(SAFE.ABS(…), 0)`

.