DataofMe
Over the last four years I have been collecting my personal health data from a variety of wearable devices. I'm going to demonstrate my current set up and share the code that I have written with Azure Functions, Azure Cosmos DB, Azure ML Studio, the Microsoft Bot Framework, LUIS.ai and Power BI.
Install / Use
/learn @nzigel/DataofMeREADME
Take Control of the Data of You
Over the last four years I have been collecting my personal health data from a variety of wearable devices. I'm going to demonstrate my current set up and share the code that I have written with Azure Functions, Azure Cosmos DB, Azure ML Studio, the Microsoft Bot Framework, LUIS.ai and Power BI. I normally work in C# but I chose to use node.js for this project as a learning opportunity. There are many opportunities to improve the quality of my javascript code but I have included it here for purpose of demonstration to accompany a recent talk on this subject that I gave at NDC Oslo. If you are a garmin user (that captures active heart rate data) you should be able to follow these instructions and build out this solution for youself using your own data. If you do so I'm really interested in getting your feedback on how it works.<br><br>
Background
There is a lot of value in the health data that you collect from wearable devices. With enough data you can gain insight like predicting when you might be getting sick and be warned beforehand so that you can make changes to avoid the event. You can also ask questions of your data like what was your resting heart rate last month and how much exercise you have been getting. The data that you are collecting from wearables is your data and I'm going to show you how I collect it and make it work for me.
<img src="./images/architecture.PNG" alt="Screenshot" style="width: 1066px;"/>Extract your Data to Train a Machine Learning Model - Garmin
The first step here assumes that you are using a Garmin wearable device and have collected a reasonable amount of data. In my experience having daily/ resting heart rate data available really makes a difference here. If you have data in another ecosystem you will need to research how to extract the data. To get to the data out of Garmin perform the following steps.
-
Navigate to https://connect.garmin.com/modern/proxy/userstats-service/wellness/daily/[username]?fromDate=yyyy-mm-dd&untilDate=yyyy-mm-dd. Replace the date range to include all the data you have been collecting and change the username with your Garmin username. You will need to first log in to https://connect.garmin.com/modern with your garmin account before this will work.
-
Load the .json file into Excel
<img src="./images/getDataJson.PNG" alt="Screenshot" style="width: 400px; padding-left: 40px;"/><br>
Click allMetrics Record to expand the record
<img src="./images/record.PNG" alt="Screenshot" style="width: 400px; padding-left: 40px;"/><br>
Click metricsMap Record to expand the record
<img src="./images/record2.PNG" alt="Screenshot" style="width: 400px; padding-left: 40px;"/><br>
Click Into Table
<img src="./images/record3.PNG" alt="Screenshot" style="width: 400px; padding-left: 40px;"/><br>
Expand next to Value Expand to New Rows
<img src="./images/record4.PNG" alt="Screenshot" style="width: 400px; padding-left: 40px;"/><br>
Click OK
<img src="./images/record5.PNG" alt="Screenshot" style="width: 400px; padding-left: 40px;"/><br>
Close and Load
<img src="./images/record6.PNG" alt="Screenshot" style="width: 400px; padding-left: 40px;"/><br>
Create a Pivot Table to transpose the columns
<img src="./images/pivotTable.PNG" alt="Screenshot" style="width: 1000px; padding-left: 40px;"/><br>
Extract the columns with values into a table
<img src="./images/table.PNG" alt="Screenshot" style="width: 1000px; padding-left: 40px;"/><br>
Now the ideal requirement here is that you have logged the days that you have been sick so that you can flag them in the dataset in order to train the model successfully. Now since I have done this and you may not have I have made two years of my data available in CSV format for you to train your model from. Since this is from my data and not your data the average resting heart rate data is probably not a match for yours meaning that the model will be less effective for you. Alternatively look into your data to see days where you have the highest resting heart rate, if any of these days are consecutive days there is a good chance that you were sick on these days and you can flag those as sick days accordingly. In my case I have found that high intensity exercise, alcohol and caffeine raise my resting heart rate. I no longer have caffeine and I log my alcohol consumption using the http://untappd.com app this is also an input into my data that forms part of my model.
-
Clean the data in Excel - replace all missing values with 0 with the exception of missing values for SLEEP_SLEEP_DURATION - missing values here you can either remove that line in your data or replace with an average value based on the surrounding data.
<img src="./images/cleanData.PNG" alt="Screenshot" style="width: 1000px; padding-left: 40px;"/><br>
Here I update the column names so that when I load them into DocumentDB they form the object names that make sense to me. I also bring in other data sources here as well. I get VO2MAX from my activity feed https://connect.garmin.com/modern/proxy/userstats-service/activities/all/[username]?fromDate=yyyy-mm-dd&untilDate=yyyy-mm-dd and import it in a similar way to what I describe above. I also bring my weight data from the fitbit aria scales into garmin via My Fitness Pal.
The column names that I have created in my CSV are as follows translated from the ones in the pivot table.
dateLogged
day // calculated as day of week from dateLogged
sleepDuration // calculated from secs to hours by dividing SLEEP_SLEEP_DURATION by (60*60)
activeCalories // WELLNESS_ACTIVE_CALORIES
floors // WELLNESS_FLOORS_ASCENDED
maxHeartRate // WELLNESS_MAX_HEART_RATE
minHeartRate // WELLNESS_MIN_HEART_RATE
moderateIntensityMins // WELLNESS_MODERATE_INTENSITY_MINUTES
restHeartRate // WELLNESS_RESTING_HEART_RATE
calories // WELLNESS_TOTAL_CALORIES
distance // calculated in KM by dividing WELLNESS_TOTAL_DISTANCE metres by 1000
steps // WELLNESS_TOTAL_STEPS
vigourousIntensityMinutes //WELLNESS_VIGOROUS_INTENSITY_MINUTES
VO2MAX // From my garmin activity data - assumed previous day value until a new value is logged
weight // From garmin linked through My Fitness Pal from my Fitbit scales
beers // This comes from untappd and is a tally of my daily beer consumption
virus // Added by me identfying which days I was sick as a label
predictVirus // Added by me as a blank column for later as a response from the ML predicition service
score // Added by me as a blank column for later as a response from the ML predicition service
Training the Machine Learning Model from the Data
Now that you have extracted a workable dataset of your data (or you have my CSV file) you are ready to train the machine learning model. The model is gets better the more data you have I generally retrain my model every couple of months or after periods where I am sick. To understand which algorthims to select I recommend that you check out the Microsoft Azure Machine Learning: Algorithm Cheat Sheet (aka.ms/MLCheatSheet). I have published my trained model and my training model to the gallery so you can use those. One thing to be aware of is that if you publish my trained model and use it against your data you will likely get false positive warnings about viruses unless your resting heart rate is very similar to my own. I highly recommend retraining this model with an export of your data.
<img src="./images/MLmodel.PNG" alt="Screenshot" style="width: 1000px; padding-left: 40px;"/><br>
-
Select Columns in Dataset - I chose a subset of my columns. I got rid of distance as it correlated perfectly to steps. I got rid of floorsDown as correlated mostly to floors. I got rid of VO2MAX as it is linked to exercise and it's slow to go up and down. I also got rid of maxHeartRate as it is a point event based on exercise and I figured that vigourousIntensityMinutes is a better read on this value. I also didn't include weight as I found daily weight data isn't very useful due to small fluxtuations in weight affected by the time of the day and whether I was measuring before or after exercise. I questioned whether I should remove the date column or not. Often sickness happens over consecutive days and the build up of resting heart rate, exercise and/or beer consumption from previous days can contribute to a prediction on future days. My rational for removing date came after I tried a regression model including the date and tried anomaly detection. Neither technique yielded as strong results as when I set virus as a binary (sick or not) two class classifier. In practice I still get the benefit of an early warning system where my model will predict virus will a low percentage of certainty or not virus with a high level of uncertainty prior to me getting sick when I'm rundown or my body starts fighting a virus. At this point I get notified and can make choices to slow down and avoid getting sick you'll see this in some of the data that I share below. I have included day of the week in the model as I think it is relevant. I am generally more rundown at a certain stages of the week and I know I generally drink more beer and exercise for longer periods on a Saturday so I thought that may play into the model.
<img src="./images/columns.PNG" alt="Scr
