Analyze your Fitbit Exercise in BigQuery

Michael Entin
3 min readJan 8, 2023

--

This is more of personal notes how to upload and analyze Fitbit activity’s track data in BigQuery. Fitbit offers reasonable basic analysis of activities, but SQL and details are always better!

Another motivation was an error I’ve made when tracking exercise: Fitbit does not have skiing exercise type, and one has to choose another activity type. I chose Hiking, which was a mistake — in Hiking mode Fitbit ignores parts of the track where it did not sense walking steps, and the stats were all wrongs. I guess I should’ve chosen generic Outdoor Workout. But let’s use this mistake to explore how to analyze track in SQL.

Loading data to BigQuery

Fitbit exports TCX format which is a dialect of XML. BigQuery prefers CSV or JSON. What I’ve found to work best for me is xq utility which is an XML-enabled version of a wonderful jq tool. Xq uses jq internally, and can output JSON-Lines format.

First, I installed yq (xq is part of it):

pip3 install yq

Then I extracted track points using

cat exercise_tcx_file.tcx \
| xq -c .TrainingCenterDatabase.Activities.Activity.Lap.Track.Trackpoint[] \
> exercise_tcx.jsonl

Here .TrainingCenterDatabase.Activities.Activity.Lap.Track.Trackpoint is the path to the track point objects we need, and jq -c json-path[] is jq way of outputting one element per string, without top level array brackets — i.e. producing BQ-compatible new-line-delimited JSON. I’m not sure it would work if the track had multiple laps, I’ve only tried it with specific exercise, and it might be more complicated in other cases.

Now let’s upload it to BigQuery. You can use UI (select options: Upload, format: JSONL, enter table name, and check Auto detect box). I used the following command line:

bq load -source_format=NEWLINE_DELIMITED_JSON --autodetect \
tmp.exercise exercise_tcx.jsonl

That’s all, we can now query it.

Analysis

To find the total path, and the duration of the activity:

SELECT 
-- sum up distances from previous location
SUM(ST_DISTANCE(
st_geogpoint(prev_long, prev_lat),
st_geogpoint(longitude, latitude))) AS distance,
-- Difference between max and min timestamp, as interval type
MAKE_INTERVAL(second => TIMESTAMP_DIFF(MAX(`Time`), MIN(`Time`), SECOND))
AS duration
FROM (
-- add previous location to each row
SELECT
`Time`,
position.LongitudeDegrees AS longitude,
position.LatitudeDegrees AS latitude,
LAG(position.LongitudeDegrees) OVER sorting AS prev_long,
LAG(position.LatitudeDegrees) OVER sorting AS prev_lat
FROM `tmp.exercise`
-- ordering by timestamp of the track point
WINDOW sorting AS (order by time)
)

Visualization

To visualize the track in BigQuery GeoViz, we can construct the segment lines in the query. Because GeoViz does not work with nested data, to style the track using heart rate data, we also need to bring heart rate “up” to the top level:

SELECT
* except(HeartRateBpm),
HeartRateBpm.Value AS HeartRateBpm, -- extract HR value up
ST_MakeLine(ST_GeogPoint(prev_long, prev_lat),
ST_GeogPoint(longitude, latitude)) AS segment
FROM
(
SELECT
* except (position),
position.LongitudeDegrees AS longitude,
position.LatitudeDegrees AS latitude,
LAG(position.LongitudeDegrees) OVER sorting AS prev_long,
LAG(position.LatitudeDegrees) OVER sorting AS prev_lat
FROM `tmp.exercise`
WINDOW sorting AS (order by time)
)
GeoViz visualization of the track
GeoViz visualization of my skiing

--

--

Michael Entin
Michael Entin

Written by Michael Entin

Hi, I'm TL of BigQuery Geospatial project. Posting small recipes and various notes for BQ Geospatial users.

No responses yet