Pipalhub
Jupyterhub setup for remote trainings of Pipal Academy
Install / Use
/learn @pipalacademy/PipalhubREADME
PipalHub - JupyterHub setup for Pipal Academy
PipalHub is JuputerHub setup optimized for remote workshops of Pipal Academy.
It contains a JupyterHub server providing one Jupyter instance for each participant, a bunch of scripts to summarize changes and export notebooks as HTML so that instructor can quickly glance though the notebooks of the participants.
Quick setup
There are some easy-install scripts that can be used to setup the node with all dependencies that will be needed to run the server.
After setting up the node, see the section on adding users and packages for next steps.
Easy install
create-node.py script (recommended)
This one script can be used from your local machine to create a new node on DigitalOcean and get PipalHub up and running on it.
Description:
This script will create a DigitalOcean droplet with the given size and name, assign a DNS entry to it, and set it up with the setup-node.sh script.
Some defaults are hardcoded as constants at the beginning of this file. These can be changed as needed.
Prerequisites:
DIGITALOCEAN_TOKENto create a node and set DNS entry on it.- One of your SSH keys should be saved on DigitalOcean. This is to run the setup script on a new node over SSH.
- DNS for the base domain (default:
"pipal.in") should be set in digitalocean.
Usage:
$ git clone https://github.com/pipalacademy/pipalhub
$ cd pipalhub
$ DIGITALOCEAN_TOKEN="token_goes_here" python3 create-node.py --name alpha --size small --hostname alpha-lab.pipal.in
--namecan be a string: this will be the name of your droplet.--sizecan be one of small, medium, large. vCPUs / memory for each size are configured in theSIZESdict defined in create-node.py.--hostnameis the subdomain that this domain will be assigned. For example, ifBASE_DOMAINis configured to bepipal.inin create-node.py, the node will become accessible at{hostname}.pipal.in
Manual installation on a node
If you have a node ready, you can use install-docker.sh and setup-node.sh to complete the manual installation.
Please note that create-node.py executes exactly this over SSH, so you don't have to do it separately.
install-docker.sh script
This script should be run as the user that will run docker for deployment. Besides installing docker on the system, this script will also run the post install steps as the current user.
If you don't have a non-root user, you can create one with create-non-root-user.sh.
# in case you need to create a new non root user
$ ./create-non-root-user.sh pipal
$ su pipal
# install docker and refresh group assignments
$ ./install-docker
Enter password:
...
$ newgrp docker
$ ... setup node script
setup-node.sh script
Description:
setup-node.sh can be run as a sudoer, this will setup everything to start running JupyterHub.
It assumes a fresh Ubuntu install, but it will work idempotently too. If a step fails due to some
external reason, you can run this script again after fixing it.
Implementation details:
It does these things:
- install nginx
- install docker
- clone this (pipalacademy/pipalhub) repository
- symlink path to this directory to
/var/www/pipalhub(it'stmp/subdirectory will be used to serve from nginx) - save the correct nginx configuration (corrected for hostname from sample one) to
etc/nginx/conf.d/lab.conf - symlink this to
/etc/nginx/conf.d - install certbot with nginx plugin
- use certbot to create SSL certificate (does not setup renewal)
- docker compose up
- reload nginx
Prerequisites:
- An Ubuntu 22.04 server with root access.
- Working directory should be
$HOME
Usage:
$ setup-node.sh hostname.pipal.in
Adding users and packages
TODO: This functionality can be added to the dashboard service
To add users to JupyterHub server,
- SSH into the machine
- Append usernames in the format
username:passwordto~/pipalhub/etc/jupyterhub/users.txt. There should be one user on each line. Refer to this sample file for an example. - Restart the containers with
docker compose restart(from the~/pipalhubdirectory)
dev@home:~$ ssh root@hostname.pipal.in # 1. ssh into the host machine
$ cd pipalhub
$ echo 'bob:bobs_password' >> etc/jupyterhub/users.txt
$ docker compose restart
Adding packages is similar, except this time you need to edit a etc/jupyterhub/requirements.txt file and repeat the same steps.
Implementation
Directory structure
This is the directory structure, after ignoring some directories / files that aren't relevant:
├── Readme.md
├── create-node.py
├── docker-compose.yml
├── etc
│ ├── jupyter
│ │ ├── jupyter_notebook_config.py
│ │ └── lab
│ │ └── docmanager.jupyterlab-settings
│ ├── jupyterhub
│ │ ├── jupyterhub_config.py
│ │ └── users.txt.sample
│ └── nginx
│ ├── conf.d
│ │ └── lab.conf.sample
│ └── default.conf
├── home
│ └── Readme.txt
├── services
│ └── dashboard_service
│ ├── dashboard_service.py
│ ├── javascripts
│ │ └── poll.js
│ ├── launch.sh
│ ├── requirements.txt
│ └── scripts
│ ├── build.py
│ ├── build.sh
│ └── ipytail.py
├── setup-node.sh
└── tmp
└── Readme.txt
There is a single docker container that contains JupyterHub and student servers. It is setup with docker compose. A web server configuration (nginx configuration provided) should be setup on the host to expose this container over a domain.
The container also contains a Dashboard service that is a Flask app which
runs some action when a notebook is saved. For now, this is the build script that
updates summaries of notebooks when a student makes a save. The scripts/ directory stores these.
The javascripts/ directory has a polling script that uses the endpoint exposed by dashboard-service
to allow a developer to perform some action on frontend when a particular event (such as save of a notebook)
is logged. For example, this can be used to notify the user on the frontend that an update is available
to the summary page.
Configuration files are kept in etc/. Of these jupyterhub/ and jupyter/ are for inside the
docker container and nginx/ is for the host.
Container
The jupyterhub container is configured using docker compose. docker-compose.yml lists several volume mounts and an expose port.
The volume mounts are either for sharing configuration with the container or for ease of visibility for the trainer.
Dashboard service
Related issue: https://github.com/pipalacademy/pipalhub/issues/6
This is a JupyterHub-managed service, i.e. the related process is started and stopped by JupyterHub.
We only need to configure it in jupyterhub_config.py,
with the command that needs to run.
Currently this is a Flask app that is started on port 10101 using environment variables
configured in jupyterhub_config.py. JupyterHub will also create a reverse proxy endpoint
on its server to this service. So, this dashboard service will be accessible at https://hostname.pipal.in/services/dashboard,
and we won't need some separate nginx configuration for this.
The launch.sh script for this installs dependencies needed for it to function with pip (flask, pydantic)
before starting it. This may change in the future, a possible solution would be to have a single
requirements.txt for the instance that comes with defaults but can be changed by the trainer.
/events endpoint
/events supports GET and POST methods.
GET /events can also be combined with filters as query params.
These are example requests/responses:
Create event:
POST /events
{
"type": "test-event",
"user": "alice",
"filename": "module1-day1.ipynb",
"path": "/home/alice/module1-day1.ipynb",
"timestamp": "2022-11-14T16:00:00.511Z"
}
--- response:
201 CREATED
{
"id": 1,
"type": "test-event",
"user": "alice",
"filename": "module1-day1.ipynb",
"path": "/home/alice/module1-day1.ipynb",
"timestamp": "2022-11-14T16:00:00.511000+00:00"
}
Note that if the client doesn't send a timestamp, the server won't raise an error but rather default to using the current timestamp.
List events:
GET /events
--- response:
200 OK
[
{
"id": 1,
"type": "test-event",
"user": "alice",
"filename": "module1-day1.ipynb",
"path": "/home/alice/module1-day1.ipynb",
"timestamp": "2022-11-14T16:00:00.511000+00:00"
},
{
"id": 2,
"type": "test-event",
"user": "bob",
"filename": "module1-day1.ipynb",
"path": "/home/bob/module1-day1.ipynb",
"timestamp": "2022-11-15T16:00:00.511000+00:00"
}
]
Listing can also have filters. Filtering can be done on any field in the returned JSON. Example:
GET /events?user=alice
--- response:
[
{
"id": 1,
"type": "test-event",
"user": "alice",
"filename": "module1-day1.ipynb",
"path": "/home/alice/module1-day1.ipynb",
"timestamp": "2022-11-14T16:00:00.511000+00:00"
}
]
Configuration
There is a bunch of configuration in etc/ that is ne
