Analyze your Fitbit Exercise in BigQuery
This is more of personal notes how to upload and analyze Fitbit activity’s track data in BigQuery. Fitbit offers reasonable basic analysis of activities, but SQL and details are always better!
Another motivation was an error I’ve made when tracking exercise: Fitbit does not have skiing exercise type, and one has to choose another activity type. I chose Hiking, which was a mistake — in Hiking mode Fitbit ignores parts of the track where it did not sense walking steps, and the stats were all wrongs. I guess I should’ve chosen generic Outdoor Workout. But let’s use this mistake to explore how to analyze track in SQL.
Loading data to BigQuery
Fitbit exports TCX format which is a dialect of XML. BigQuery prefers CSV or JSON. What I’ve found to work best for me is xq
utility which is an XML-enabled version of a wonderful jq
tool. Xq
uses jq
internally, and can output JSON-Lines format.
First, I installed yq
(xq
is part of it):
pip3 install yq
Then I extracted track points using
cat exercise_tcx_file.tcx \
| xq -c .TrainingCenterDatabase.Activities.Activity.Lap.Track.Trackpoint[] \
> exercise_tcx.jsonl
Here .TrainingCenterDatabase.Activities.Activity.Lap.Track.Trackpoint
is the path to the track point objects we need, and jq -c json-path[]
is jq
way of outputting one element per string, without top level array brackets — i.e. producing BQ-compatible new-line-delimited JSON. I’m not sure it would work if the track had multiple laps, I’ve only tried it with specific exercise, and it might be more complicated in other cases.
Now let’s upload it to BigQuery. You can use UI (select options: Upload, format: JSONL, enter table name, and check Auto detect box). I used the following command line:
bq load -source_format=NEWLINE_DELIMITED_JSON --autodetect \
tmp.exercise exercise_tcx.jsonl
That’s all, we can now query it.
Analysis
To find the total path, and the duration of the activity:
SELECT
-- sum up distances from previous location
SUM(ST_DISTANCE(
st_geogpoint(prev_long, prev_lat),
st_geogpoint(longitude, latitude))) AS distance,
-- Difference between max and min timestamp, as interval type
MAKE_INTERVAL(second => TIMESTAMP_DIFF(MAX(`Time`), MIN(`Time`), SECOND))
AS duration
FROM (
-- add previous location to each row
SELECT
`Time`,
position.LongitudeDegrees AS longitude,
position.LatitudeDegrees AS latitude,
LAG(position.LongitudeDegrees) OVER sorting AS prev_long,
LAG(position.LatitudeDegrees) OVER sorting AS prev_lat
FROM `tmp.exercise`
-- ordering by timestamp of the track point
WINDOW sorting AS (order by time)
)
Visualization
To visualize the track in BigQuery GeoViz, we can construct the segment lines in the query. Because GeoViz does not work with nested data, to style the track using heart rate data, we also need to bring heart rate “up” to the top level:
SELECT
* except(HeartRateBpm),
HeartRateBpm.Value AS HeartRateBpm, -- extract HR value up
ST_MakeLine(ST_GeogPoint(prev_long, prev_lat),
ST_GeogPoint(longitude, latitude)) AS segment
FROM
(
SELECT
* except (position),
position.LongitudeDegrees AS longitude,
position.LatitudeDegrees AS latitude,
LAG(position.LongitudeDegrees) OVER sorting AS prev_long,
LAG(position.LatitudeDegrees) OVER sorting AS prev_lat
FROM `tmp.exercise`
WINDOW sorting AS (order by time)
)