SkillAgentSearch skills...

Hillview

Big data spreadsheet

Install / Use

/learn @vmware-archive/Hillview
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

This project has been archived in August 2022. The code should run fine, but no more updates are planned.

Hillview project logo

Hillview: a big data spreadsheet. Hillview is a cloud-based service for visualizing interactively large datasets. The hillview user interface executes in a browser.

Contents:

1. Documentation

2. Local installation

3. Cluster installation

4. Developing Hillview

1. Documentation

There is a Hillview user manual.

A short video shows the system in action in real-time.

You can try a demo of the system running on 15 small Amazon machines. (the demo will stop working eventually)

A paper describing the system in some detail. This is an extended version of the following publication Mihai Budiu, Parikshit Gopalan, Lalith Suresh, Udi Wieder, Han Kruiger, and Marcos K. Aguilera, Hillview: A trillion-cell spreadsheet for big data, in PVLDB 2019, 12(11).

Documentation for the internal APIs.

Experimental use of Hillview using differential privacy.

2. Installing and running Hillview on a local machine

2.1 Linux of MacOS

2.1.1 Installing on Linux or MacOS

This will install pre-built binaries.

  • Install Java 8. At this point newer versions of Java will not work.
  • clone this github repository
  • run the script bin/install.sh

2.1.2 Running on Ubuntu or MacOS machines

All the following scripts are in the bin folder.

$ cd bin
  • Start the back-end service which performs all the data processing:
$ ./backend-start.sh &
  • Start the web server (the default port of the web server is 8080; if you want to change it, change the setting in apache-tomcat-9.0.4/conf/server.xml).
$ ./frontend-start.sh
  • start a web browser and open http://localhost:8080

  • when you are done stop the two services by killing the frontend-start.sh and backend-start.sh jobs.

2.2 Windows

2.2.1 Installing on Windows

  • Download and install Java 8.
  • Choose a directory for installing Hillview
  • Enable execution of powershell scripts; this can be done, for example, by running the following command in powershell as an administrator: Set-ExecutionPolicy unrestricted
  • Download and install the following script in the chosen directory
  • Run the installation script using Windows powershell:
> install-hillview.ps1

2.2.2 Running on Windows

All Windows scripts are in the bin folder:

> cd bin
  • Start Hillview processes:
> hillview-start.bat
  • If needed give permissions to the application to communicate through the Windows firewall
  • To stop hillview:
> hillview-stop.bat

3. Deploying the Hillview service on a cluster

Hillview uses ssh to deploy code on the cluster. Prior to deployment you must setup ssh on the cluster to use password-less access to the cluster machines, as described here: https://www.ssh.com/ssh/copy-id. You must also install Java on all machines in the cluster. Each machine in the cluster must allow connections on the network ports described in the configuration file.

Please note that Hillview allows arbitrary access to files on the worker nodes from the client application running with the privileges of the user specified in the configuration file.

3.1 Service configuration

The configuration of the Hillview service is described in a Json file (enhanced with comments); two sample files are bin/config.jsonand bin/config-local.json. The file config-local.json treats the local machine as a one-machine cluster.

// This file is a Json file that defines the configuration for a
// Hillview deployment.

{
  // Name of machine hosting the web server
  "webserver": "web.server.name",
  // Names of the machines hosting the workers; the web
  // server machine can also act as a worker
  "aggregators": [
    // The "aggregators" level is optional; if it is
    // missing, the configuration should contain just an array of workers
    {
      "name": "aggregator1.name",
      "workers": [
        "worker1.name",
        "worker2.name"
      ]
    }, {
      "name": "aggregator2.name",
      "workers": [
        "worker3.name",
        "worker4.name"
      ]
    }
  ],
  // Network port where the workers listen for requests
  "worker_port": 3569,
  // Network port where aggregators listen for requests
  "aggregator_port": 3570,
  // Java heap size for Hillview workers
  "default_heap_size": "25G",
  // User account for running the Hillview service, default is current user
  "user": "hillview",
  // Folder where the hillview service is installed on remote machines
  "service_folder": "/home/hillview",
  // Version of Apache Tomcat to deploy
  "tomcat_version": "9.0.4",
  // Tomcat installation folder name
  "tomcat": "apache-tomcat-9.0.4",
  // If true delete old log files, default is false
  "cleanup": false,
  // This can be used to override the default_heap_size for specific machines.
  "workers_heapsize": {
    "worker1.name": "20G"
  }
}

3.2 Deployment scripts

First install Hillview locally:

$ bin/install.sh

Edit the config.json file as described above.

All deployment scripts are written in Python, and are in the bin folder.

$ cd bin

Install the software on all cluster machines:

$ ./deploy.py config.json

Start the Hillview services:

$ ./start.py config.json

To connect to the service open http://<webserver>:8080 in your web browser.

Stop the services:

$ ./stop.py config.json

Query the status of the services:

$ ./status.py config.json

3.3. Data management

We provide some crude data management scripts and tools for clusters. They are described here.

4. Developing Hillview

We only provide development instructions for Linux or MacOS, but there is no reason Hillview could not be developed on Windows.

4.1. Software Dependencies

  • Back-end: Ubuntu Linux > 16 or MacOS. On MacOS you first need to install Homebrew. One way to do that is to run
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  • Java 8, Maven build system, various Java libraries (Maven will manage the libraries)
  • Front-end: Typescript, webpack, Tomcat app server, node.js; some JavaScript libraries: d3, pako, and rx-js
  • Cloud service management: Python3
  • Once you have Java everything else is installed by scripts.

4.1.1 Installing Java

We use Java 8; newer versions will not work.

First, download a JDK (for Linux x64 or MacOS) from here: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Note: it is not enough to have a Java VM installed, you need a JDK.

Make sure to download the tarball version of the JDK.

For Linux: Unpack the JDK, and set your JAVA_HOME environment variable to point to the unpacked folder (e.g, <fully qualified path to>/jdk/jdk1.8.0_101). To set your JAVA_HOME environment variable, add the following to your ~/.bashrc or ~/.zshrc.

$ export JAVA_HOME="<path-to-jdk-folder>"

(For MacOS you do not need to set up JAVA_HOME.)

4.1.2. Installing other software needed

The following shell script will install the other required dependencies for building and testing.

$ cd bin
$ ./install-dependencies.sh

For old versions of Ubuntu this may fail, so you may have to install the required dependencies manually.

4.1.2.1 Optional Impala Java libraries

If you want to access the Impala database you will need to download and install the JDBC connectors for Impala libraries from Cloudera. (These are not free software, so they are not available in Java Maven repositories.) You should install these in your local Maven repository, e.g. in the ~/.m2/com/cloudera/impala folder. You may also need to adjust the version of the libraries in the file platform/pom.xml.

4.2. Building Hillview

  • Build the software for the first time:
$ cd bin
$ ./rebuild.sh -a
$ ./demo-data-cleaner.sh

Subsequent builds can just run

$ bin/rebuild.sh

Hillview is currently split into two separate Maven projects. One can also build the two projects separately, as follows:

  • platform: pure Java, includes the entire back-end. This produces a JAR file platform/target/hillview-jar-with-dependencies.jar. This part can be built with:
$ cd platform
$ mvn clean install
$ cd ..
  • web: the web server, web client and web services; this project links to the result produced by the platform project. This produces a WAR (web archive) file web/target/web-1.0-SNAPSHOT.war. This part can be built with:
$ cd web
$ mvn package
$ cd ..

4.3. Contributing code

You will need to sign a CLA (Contributor License Agreement) to contribute code to Hillview under an Apache-2 license. This is very standard.

4.4. Setup IntelliJ IDEA

Download and install Intellij IDEA: https://www.jetbrains.com/idea/. The web project typescript requires the (paid) Ultimate version of Intellij.

First run maven to generate the Java code automatically generated for gRPC:

$ cd platform
$ mvn install

Create an empty project in the hillview folder, and then import three modules (from File/Project structure/Modules, add three modules: web/pom.xml, platform/pom.xml, and the root folder hillview itself).

4.5. Setup VS Code

Download and install Visual Studio Code: https://code.visualstudio.com/download. Here is a step-by-step guide

View on GitHub
GitHub Stars107
CategoryDevelopment
Updated9d ago
Forks29

Languages

Java

Security Score

80/100

Audited on Mar 22, 2026

No findings