SkillAgentSearch skills...

GLOBEM

No description available

Install / Use

/learn @UW-EXP/GLOBEM
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

GLOBEM Logo

This is the official codebase of the dataset paper GLOBEM Dataset: Multi-Year Datasets for Longitudinal Human Behavior Modeling Generalization, accepted by NeurIPS 2022 Dataset and Benchmark Track (Link)

This is the official codebase of the platform paper GLOBEM: Cross-Dataset Generalization of Longitudinal Human Behavior Modeling, accepted by IMWUT 2023 (Link) 🏆 Our paper has won the Distinguished Paper Award @ UbiComp 2023!

Introduction

GLOBEM is a platform to accelerate cross-dataset generalization research in the longitudinal behavior modeling domain. The name GLOBEM is the short for Generalization of LOngitudinal BEhavior Modeling.

GLOBEM not only supports flexible and rapid evaluation of the existing behavior modeling methods, but also provides easy-to-extend templates for researchers to develop and prototype their own behavior modeling algorithms.

Overview of the platform

GLOBEM currently supports depression detection and closely reimplements the following algorithms[^compatibility]:

Highlight

Multi-year Dataset (README.md)

Along with the GLOBEM, we released the frist multi-year mobile and wearable sensing datasets that contain four data collection studies from 2018 to 2021, covering over 700 person-years from 497 unique users. To download the dataset, please visit our PhysioNet page, and data_raw/README.md for more details about data format.

Overview of the data collection studies

The datasets capture various aspects of participants' life experience:

Timeseries data examples

Benchmark

The table shows the balanced accuracy performance of different algorithms in multiple evaluation setups:

The overview of the results

Tutorial

Below is a brief description of the platform and a simple tutorial on how to use the platform. This tutorial uses depression detection as the longitudinal behavior modeling example.

TL;DR

Our platform is tested with Python 3.7 under MacOS 11.6 (intel) and CentOS 7. Try the platform with one line of command, assuming Anaconda/miniconda is already installed on the machine. Please find the details of the setup and examples explained in the rest of tutorial.

/bin/bash run.sh

Setup

Environment

GLOBEm is a python-based platform to leverage its flexibility and large number of open libraries. Java JDK (>= 11) is needed for ml_xu_interpretable, ml_xu_personalized, and ml_chikersal. Here is an example of using Anaconda or miniconda for environment setup:

conda create -n globem python=3.7
conda activate globem
pip install -r requirements.txt

<a name="dataset_preparation"></a> Dataset Preparation

Example raw data are provided in data_raw folder[^data]. Each dataset contains ParticipantsInfoData, FeatureData, and SurveyData. Please refer to data_raw/README.md for more details about the raw data format.

A simple script is prepared to process these raw data and save a list of files into data folder. As different subjects may have a different amount of data, the main purpose of the processing is to slice the data into standard <feature matrix, label> pairs (see Input for the definitions). Every dataset contains a list of <feature matrix, label> pairs, which are saved in a DatasetDict object.

The preparation can be done by running (ETA ~45 mins with the completed version of the four-year dataset, ~3 mins with the sample data.)

python data/data_prep.py

How to run an existing algorithm

The template command:

python evaluation/model_train_eval.py
  --config_name=[a model config file name]
  --pred_target=[prediction target]
  --eval_task=[evaluation task]

Currently, pred_target supports the depression detection task:

  • dep_weekly: weekly depression status prediction

eval_task supports a few single or multiple evaluation setups, such as:

  • single_within_user: within-user training/testing on a single dataset
  • allbutone: leave-one-dataset-out setup
  • crosscovid: pre/post COVID setup, only support certain datasets
  • two_overlap: train/test on overlapping users between two datasets
  • all: do all evaluation setups

An example of running Chikersal et al.'s algorithm on each dataset to predict weekly depression:

python evaluation/model_train_eval.py \
  --config_name=ml_chikersal \
  --pred_target=dep_weekly \
  --eval_task=single_within_user

An example of running Reorder algorithm with a leave-one-dataset-out setup to predict weekly depression (Note that deep learning algorithms are not compatible with the end-of-term prediction task due to the limited data size):

python evaluation/model_train_eval.py \
  --config_name=dl_reorder \
  --pred_target=dep_weekly \
  --eval_task=allbutone

The model training and evaluation results will be saved at evaluation_output folder with corresponding path.

In these two examples, they will be saved at evaluation_output/evaluation_single_dataset_within_user/dep_weekly/ml_chikersal.pkl and evaluation_output/evaluation_allbutone_datasets/dep_weekly/dl_reoreder.pkl, respectively.

Reading the results is straightforward:

import pickle
import pandas, numpy

with open("evaluation_output/evaluation_single_dataset_within_user/dep_weekly/ml_chikersal.pkl", "rb") as f:
    evaluation_results = pickle.load(f)
    df = pandas.DataFrame(evaluation_results["results_repo"]["dep_weekly"]).T
    print(df[["test_balanced_acc", "test_roc_auc"]])

with open("evaluation_output/evaluation_allbutone_datasets/dep_weekly/dl_reorder.pkl", "rb") as f:
    evaluation_results = pickle.load(f)
    df = pandas.DataFrame(evaluation_results["results_repo"]["dep_weekly"]).T
    print(df[["test_balanced_acc", "test_roc_auc"]])

Please refer to analysis/prediction_results_analysis.ipynb for more examples of results processing.

Code Breakdown

The two examples above are equivalent to the following code blocks:

Chikersal et al.'s algorithm doing the single dataset evaluation task:

import pandas, numpy
from data_loader import data_loader_ml
from utils import train_eval_pipeline
from algorithm import algorithm_factory

ds_keys = ["INS-W_1", "INS-W_2", "INS-W_3", "INS-W_4"] # list of datasets to be included
pred_targets = ["dep_weekly"] # list of prediction task
config_name = "ml_chikersal" # model config

dataset_dict = data_loader_ml.data_loader(ds_keys_dict={pt: ds_keys for pt in pred_targets})
algorithm = algorithm_factory.load_algorithm(config_name=config_name)
evaluation_results = train_eval_pipeline.single_dataset_within_user_driver(
    dataset_dict, pred_targets, ds_keys, algo
View on GitHub
GitHub Stars354
CategoryEducation
Updated2d ago
Forks81

Languages

Python

Security Score

95/100

Audited on Apr 7, 2026

No findings