SkillAgentSearch skills...

Pipalhub

Jupyterhub setup for remote trainings of Pipal Academy

Install / Use

/learn @pipalacademy/Pipalhub
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

PipalHub - JupyterHub setup for Pipal Academy

PipalHub is JuputerHub setup optimized for remote workshops of Pipal Academy.

It contains a JupyterHub server providing one Jupyter instance for each participant, a bunch of scripts to summarize changes and export notebooks as HTML so that instructor can quickly glance though the notebooks of the participants.

Quick setup

There are some easy-install scripts that can be used to setup the node with all dependencies that will be needed to run the server.

After setting up the node, see the section on adding users and packages for next steps.

Easy install

create-node.py script (recommended)

This one script can be used from your local machine to create a new node on DigitalOcean and get PipalHub up and running on it.

Description:

This script will create a DigitalOcean droplet with the given size and name, assign a DNS entry to it, and set it up with the setup-node.sh script.

Some defaults are hardcoded as constants at the beginning of this file. These can be changed as needed.

Prerequisites:

  • DIGITALOCEAN_TOKEN to create a node and set DNS entry on it.
  • One of your SSH keys should be saved on DigitalOcean. This is to run the setup script on a new node over SSH.
  • DNS for the base domain (default: "pipal.in") should be set in digitalocean.

Usage:

$ git clone https://github.com/pipalacademy/pipalhub
$ cd pipalhub
$ DIGITALOCEAN_TOKEN="token_goes_here" python3 create-node.py --name alpha --size small --hostname alpha-lab.pipal.in
  • --name can be a string: this will be the name of your droplet.
  • --size can be one of small, medium, large. vCPUs / memory for each size are configured in the SIZES dict defined in create-node.py.
  • --hostname is the subdomain that this domain will be assigned. For example, if BASE_DOMAIN is configured to be pipal.in in create-node.py, the node will become accessible at {hostname}.pipal.in

Manual installation on a node

If you have a node ready, you can use install-docker.sh and setup-node.sh to complete the manual installation.

Please note that create-node.py executes exactly this over SSH, so you don't have to do it separately.

install-docker.sh script

This script should be run as the user that will run docker for deployment. Besides installing docker on the system, this script will also run the post install steps as the current user.

If you don't have a non-root user, you can create one with create-non-root-user.sh.

# in case you need to create a new non root user
$ ./create-non-root-user.sh pipal
$ su pipal

# install docker and refresh group assignments
$ ./install-docker
Enter password: 

...
$ newgrp docker
$ ... setup node script

setup-node.sh script

Description:

setup-node.sh can be run as a sudoer, this will setup everything to start running JupyterHub. It assumes a fresh Ubuntu install, but it will work idempotently too. If a step fails due to some external reason, you can run this script again after fixing it.

Implementation details:

It does these things:

  • install nginx
  • install docker
  • clone this (pipalacademy/pipalhub) repository
  • symlink path to this directory to /var/www/pipalhub (it's tmp/ subdirectory will be used to serve from nginx)
  • save the correct nginx configuration (corrected for hostname from sample one) to etc/nginx/conf.d/lab.conf
  • symlink this to /etc/nginx/conf.d
  • install certbot with nginx plugin
  • use certbot to create SSL certificate (does not setup renewal)
  • docker compose up
  • reload nginx

Prerequisites:

  • An Ubuntu 22.04 server with root access.
  • Working directory should be $HOME

Usage:

$ setup-node.sh hostname.pipal.in

Adding users and packages

TODO: This functionality can be added to the dashboard service

To add users to JupyterHub server,

  1. SSH into the machine
  2. Append usernames in the format username:password to ~/pipalhub/etc/jupyterhub/users.txt. There should be one user on each line. Refer to this sample file for an example.
  3. Restart the containers with docker compose restart (from the ~/pipalhub directory)
dev@home:~$ ssh root@hostname.pipal.in  # 1. ssh into the host machine
$ cd pipalhub
$ echo 'bob:bobs_password' >> etc/jupyterhub/users.txt
$ docker compose restart

Adding packages is similar, except this time you need to edit a etc/jupyterhub/requirements.txt file and repeat the same steps.

Implementation

Directory structure

This is the directory structure, after ignoring some directories / files that aren't relevant:

├── Readme.md
├── create-node.py
├── docker-compose.yml
├── etc
│   ├── jupyter
│   │   ├── jupyter_notebook_config.py
│   │   └── lab
│   │       └── docmanager.jupyterlab-settings
│   ├── jupyterhub
│   │   ├── jupyterhub_config.py
│   │   └── users.txt.sample
│   └── nginx
│       ├── conf.d
│       │   └── lab.conf.sample
│       └── default.conf
├── home
│   └── Readme.txt
├── services
│   └── dashboard_service
│       ├── dashboard_service.py
│       ├── javascripts
│       │   └── poll.js
│       ├── launch.sh
│       ├── requirements.txt
│       └── scripts
│           ├── build.py
│           ├── build.sh
│           └── ipytail.py
├── setup-node.sh
└── tmp
    └── Readme.txt

There is a single docker container that contains JupyterHub and student servers. It is setup with docker compose. A web server configuration (nginx configuration provided) should be setup on the host to expose this container over a domain.

The container also contains a Dashboard service that is a Flask app which runs some action when a notebook is saved. For now, this is the build script that updates summaries of notebooks when a student makes a save. The scripts/ directory stores these. The javascripts/ directory has a polling script that uses the endpoint exposed by dashboard-service to allow a developer to perform some action on frontend when a particular event (such as save of a notebook) is logged. For example, this can be used to notify the user on the frontend that an update is available to the summary page.

Configuration files are kept in etc/. Of these jupyterhub/ and jupyter/ are for inside the docker container and nginx/ is for the host.

Container

The jupyterhub container is configured using docker compose. docker-compose.yml lists several volume mounts and an expose port. The volume mounts are either for sharing configuration with the container or for ease of visibility for the trainer.

Dashboard service

Related issue: https://github.com/pipalacademy/pipalhub/issues/6

This is a JupyterHub-managed service, i.e. the related process is started and stopped by JupyterHub. We only need to configure it in jupyterhub_config.py, with the command that needs to run.

Currently this is a Flask app that is started on port 10101 using environment variables configured in jupyterhub_config.py. JupyterHub will also create a reverse proxy endpoint on its server to this service. So, this dashboard service will be accessible at https://hostname.pipal.in/services/dashboard, and we won't need some separate nginx configuration for this.

The launch.sh script for this installs dependencies needed for it to function with pip (flask, pydantic) before starting it. This may change in the future, a possible solution would be to have a single requirements.txt for the instance that comes with defaults but can be changed by the trainer.

/events endpoint

/events supports GET and POST methods.

GET /events can also be combined with filters as query params.

These are example requests/responses:

Create event:
POST /events
{
    "type": "test-event",
    "user": "alice",
    "filename": "module1-day1.ipynb",
    "path": "/home/alice/module1-day1.ipynb",
    "timestamp": "2022-11-14T16:00:00.511Z"
}
--- response:
201 CREATED
{
    "id": 1,
    "type": "test-event",
    "user": "alice",
    "filename": "module1-day1.ipynb",
    "path": "/home/alice/module1-day1.ipynb",
    "timestamp": "2022-11-14T16:00:00.511000+00:00"
}

Note that if the client doesn't send a timestamp, the server won't raise an error but rather default to using the current timestamp.

List events:
GET /events
--- response:
200 OK
[
    {
        "id": 1,
        "type": "test-event",
        "user": "alice",
        "filename": "module1-day1.ipynb",
        "path": "/home/alice/module1-day1.ipynb",
        "timestamp": "2022-11-14T16:00:00.511000+00:00"
    },
    {
        "id": 2,
        "type": "test-event",
        "user": "bob",
        "filename": "module1-day1.ipynb",
        "path": "/home/bob/module1-day1.ipynb",
        "timestamp": "2022-11-15T16:00:00.511000+00:00"
    }
]

Listing can also have filters. Filtering can be done on any field in the returned JSON. Example:

GET /events?user=alice
--- response:
[
    {
        "id": 1,
        "type": "test-event",
        "user": "alice",
        "filename": "module1-day1.ipynb",
        "path": "/home/alice/module1-day1.ipynb",
        "timestamp": "2022-11-14T16:00:00.511000+00:00"
    }
]

Configuration

There is a bunch of configuration in etc/ that is ne

View on GitHub
GitHub Stars4
CategoryDevelopment
Updated12mo ago
Forks2

Languages

Python

Security Score

62/100

Audited on Apr 4, 2025

No findings