Openhexa
An open-source data integration platform for public health
Install / Use
/learn @BLSQ/OpenhexaREADME
OpenHEXA
OpenHEXA is an open-source data integration and data analysis platform developed by Bluesquare.
Its goal is to facilitate data integration and analysis workflows, in particular in the context of public health projects.
OpenHEXA allows you to:
- Create workspaces to group code, data and users
- Upload and read files from a shared filesystem
- Write and read to a PostgreSQL database
- Use Jupyter notebooks to explore and analyze data
- Run and schedule complex data workflows using data pipelines
- Manage your team members
Please note that this repository does not contain any code: it is a starting point for OpenHEXA users and implementers. Please refer to the technical architecture page of our wiki for more information about the different OpenHEXA components, including the links to the relevant GitHub repositories.
Documentation
The OpenHEXA documentation lives in our wiki.
To get started, you might be interested in the following pages:
Roadmap, issues and discussions
Feel free to reach out in the discussions section if you have questions or suggestions!
Quick Start
Requirements:
- a least Docker 26.1
- Debian bookworm
- Debian packages
gettext-base,postgresql(14+),postgresql-<postgresql version>-postgis-3,duplicity(optional to manage backup and restore) - yq
After having cloned this repo and change your current dir to it, you can check your installation by running first
./script/setup.sh check
It'll tell you that the .env is missing, that is expected as it's the next
step.
Then, you need to setup the environment and the database. To do so execute the following command
./script/setup.sh all
This will generate a file in the working directory: .env (ee below to
know more about the configuration properties).
Then you can prepare the database and environment with
./script/openhexa.sh prepare
[!IMPORTANT] The
preparecommand will create an initial superuser for your installation. If you are setting up a real server, make sure you choose a secure password.
Finally, you can run openhexa with
./script/openhexa.sh start
To stop, execute
./script/openhexa.sh stop
If you need to purge the configuration and the database after having stopped it, you can do it by executing the following command
./script/openhexa.sh purge
Once installed, it could be interesting to make sure you have the last version. You can update openhexa with
./script/openhexa.sh update
Debian Package
Requirements
To release and build the Debian package, you need to run on a Debian like Linux distribution
and the following packages are required: devscripts, debhelper,
build-essential. To install them, run the following command:
sudo apt install devscripts debhelper build-essential
Notice this requires super user right (that's what sudo gives you).
If you are not on a debian based distribution, you can use the Dockerfile.build to build a debian container that will do the job for you.
docker build --platform linux/amd64 -t openhexa-build -f Dockerfile.build .
docker run -it -v $(pwd):/work openhexa-build
You can then follow the instructions below to build the package as usual.
Release, changelog, and versions
The versions are described into the changelog file. The last
one is unreleased and is the one that is published. To manage versions and
changelog, we use the debhelper tool dch.
To add a new change, do:
EMAIL="firstname lastname <email@address.org>" dch -a
This will open your favorite editor so you can edit the changelog. Save, commit, push, and GitHub Actions will do the rest.
To release a version, do:
EMAIL="firstname lastname <email@address.org>" dch -rD stable
To add a new unreleased version do
EMAIL="firstname lastname <email@address.org>" dch -i -D UNRELEASED -U
Build
When all the requirements are met, run the following script to build the package:
./script/build.sh
The script will check the requirements. Notice that it works with your Git working copy, and all your stage need to be clean. So, if you have any changes, commit or stash them before running the script.
The resulting package is available in the parent directory:
../openhexa_1.0-1_amd64.deb.
Install
Requirements:
- a least Docker 26.1
- Debian bookworm
- Systemd
- yq
First of all, you need to add our APT repository and GPG public key:
curl -fsSL https://raw.githubusercontent.com/blsq/openhexa/refs/heads/main/pubkey.gpg | sudo gpg --yes --dearmor --output /usr/share/keyrings/openhexa.gpg
echo "deb [signed-by=/usr/share/keyrings/openhexa.gpg] https://viz.bluesquare.org/openhexa/ bookworm main" | sudo tee /etc/apt/sources.list.d/openhexa.list
Make sure your locales are correctly set with locale. A common setup is
# Set locale
sudo tee -a /etc/default/locale > /dev/null <<EOF
LC_ALL=C.UTF-8
LC_TYPE=C.UTF-8
LC_MESSAGE=C.UTF-8
LC_COLLATE=C.UTF-8
EOF
source /etc/default/locale
Then, you can update your APT database and install openhexa
sudo apt update
sudo apt install openhexa
If you want to manage backup and retore through our script, you can install it
with recommended packages sudo apt install --install-recommends openhexa.
If you have Systemd, OpenHexa is run as a Systemd service openhexa (that you
can then manage with systemctl). If you don't use Systemd, you can still run
the service by running /usr/share/openhexa/openhexa -g start.
Usage
When installed, the Systemd service OpenHexa is started. If you need to get its
status, stop it, restart it, or start it, you can do it with systemctl.
A command is also installed to ease the interaction with OpenHexa:
/usr/share/openhexa/openhexa.sh. To get its usage documentation, run:
/usr/share/openhexa/openhexa.sh help
If you want to interact with an OpenHexa installed globally on the system,
you'll have to use the option -g, or it'll try to interact with the version
in your current directory. For instance, to get its status, you can execute:
/usr/share/openhexa/openhexa.sh -g status
Configuration
The installation will also sets up the environment, especially the PostgreSQL
database. The configuration is stored in the file /etc/openhexa/env.conf
(see below for more information about the configuration properties). If you
need to change or add, you can directly change this file, then restarts
OpenHexa with sudo systemctl restart openhexa.
If you need to set it up again, check the installation, or purge the environment
(database and configuration), you can use the tool
/usr/share/openhexa/setup.sh. To get its usage documentation, run:
/usr/share/openhexa/setup.sh help
PostgreSQL
During the setup, the following is done on the PostgreSQL side:
- create 2 databases
hexa-app, andhexa-hub. The first one is used by the OpenHexa app, the second to manage the notebooks. - create 1 superuser
hexa-app, owner ofhexa-app. - create 1 superuser
hexa-hub, owner ofhexa-hub. - make PostgreSQL listens on the Docker gateway IP address.
- authorize all users to connect to
hexa-appfrom the entire Docker subnetwork with encrypted password authentication. - authorize
hexa-hubto connect tohexa-hubfrom the entier Docker subnetwork with encrypted password authentication.
Backup
You can manage your backup and restore directly with OpenHexa. It will backup
all the workspaces data, and all databases. This relies on the tool duplicity.
Make sure that it is installed if you haven't installed it yet (if you install
OpenHexa with apt, do it with the recommended packages).
First, you need to set it up:
/usr/share/openhexa/setup.sh backup /mylocaldirecotry/where/to/do/thebackup/ encryption_passkey
Then you can back up the data with:
/usr/share/openhexa/openhexa.sh backup
Depending on the user activities, it might be a good idea to stop the service or simply redirect the website to a maintenance HTML page.
To restore the data, you execute the following:
/usr/share/openhexa/openhexa.sh backup
In this case, we advise you to stop the service before performing a full restore.
Configuration properties
The storage engine
Locally, we use Minio to manage the storage. It provides a AWS S3 compatible
API. To access to it, you need to provide a key Id and a secret:
WORKSPACE_STORAGE_ENGINE_AWS_ACCESS_KEY_ID and
WORKSPACE_STORAGE_ENGINE_AWS_SECRET_ACCESS_KEY.
Finally, we need the port number where the loca
