Guess.js (alpha)

Libraries and tools for enabling data-driven user-experiences on the web.

Quickstart

For Webpack users:

:black_circle: Data-driven bundling

Install and configure GuessPlugin - the Guess.js webpack plugin which automates as much of the setup process for you as possible.

Should you wish to try out the modules we offer individually, the packages directory contains three packages:

ga - a module for fetching structured data from the Google Analytics API to learn about user navigation patterns.
parser - a module providing JavaScript framework parsing. This powers the route-parsing capabilities implemented in the Guess webpack plugin.
webpack - a webpack plugin for setting up predictive fetching in your application. It consumes the ga and parser modules and offers a large number of options for configuring how predictive fetching should work in your application.

For non-Webpack users:

:black_circle: Data-driven loading

Our predictive-fetching for sites workflow provides a set of steps you can follow to integrate predictive fetching using the Google Analytics API to your site.

This repo uses Google Analytics data to determine which page a user is mostly likely to visit next from a given page. A client-side script (which you'll add to your application) sends a request to the server to get the URL of the page it should fetch, then prefetches this resource.

Learn More

What is Guess.js?

Guess.js provides libraries & tools to simplify predictive data-analytics driven approaches to improving user-experiences on the web. This data can be driven from any number of sources, including analytics or machine learning models. Guess.js aims to lower the friction of consuming and applying this thinking to all modern sites and apps, including building libraries & tools for popular workflows.

Applying predictive data-analytics thinking to sites could be applied in a number of contexts:

Predict the next page (or pages) a user is likely to visit and prefetch these pages, improving perceived page load performance and user happiness.
- Page-level: Prerender/Prefetch the page which is most likely to be visited next
- Bundle-level: Prefetch the bundles associated with the top N pages. On each page navigation, at all the neighbors of the current page, sorted in descending order by the probability to be visited. Fetch assets (JavaScript chunks) for the top N pages, depending on the current connection effective type.
Predict the next piece of content (article, product, video) a user is likely to want to view and adjust or filter the user experience to account for this.
Predict the types of widgets an individual user is likely to interact with more (e.g games) and use this data to tailor a more custom experience.

By collaborating across different touch-points in the ecosystem where data-driven approaches could be easily applied, we hope to generalize common pieces of infrastructure to maximize their applicability in different tech stacks.

Problems we're looking to solve

Developers using <link rel=prefetch> for future navigations heavily rely on manually reading descriptive analytics to inform their decisions for what to prefetch.
These decisions are often made at a point in time and..
- (1) are often not revisited as data trends change
- (2) are very limited in how they are used. Implementations will often only prefetch content from a homepage or very small set of hero pages, but otherwise not do this for all of the possible entry points on a site. This can leave performance opportunities on the table.
- (3) Require some amount of confidence about the data being used to drive decisions around using prefetching means that developers may not be adopting it out of worry they will waste bandwidth. <link rel=prefetch> is currently used on 5% of total Chrome pageloads, but this could be higher.
Implementing predictive analytics is too complex for the average web developer.
- Most developers are unfamiliar with how to leverage the Google Analytics API to determine the probability a page will be visited next. We lack:
- (1) Page-level solution: a drop-in client-side solution for prefetching pages a user will likely visit
- (2) Bundling-level solution: a set of plugins/tools that work with today’s JavaScript bundlers (e.g webpack) to cluster and generate the bundles/chunks a particular set of navigation paths could load quicker were they to be prefetched ahead of time.
Most developers are not yet familiar with how Machine Learning works. They are generally:
- (1) Unsure how (and why) ML could be integrated into their existing (web) tech stacks
- (2) What the value proposition of TensorFlow is or where solutions like the CloudML engine fit in. We have an opportunity to simplify the overhead associated with leveraging some of these solutions.
Best-in-class / low-friction approaches in this space are still slowly emerging and are not yet as accessible to web developers without ML or data-science backgrounds.
- Machine Learning meets Cloud: Intelligent Prefetching by IIH Nordic
  - Tag Managers like Google Tag Manager can be used to decouple page content from the code tracking how the content is used. This allows web analysts to upgrade the tracking code in real-time with no site downtime. Tag managers allow a general solution for code injection and can be used to deploy intelligent prefetching. The advantages: analytics used to build the model comes from the tag manager. We can also send data live to the predictor without additional tracker overhead. After adding a few (of IIH Nordic’s) tags to a GTM install, a site can start to prefetch resources of the next pages and track load time saving opportunities.
  - IIH Nordic moved the predictive prefetching model to a web service the browser queries when a user visits a new page. The service responds to each request and takes advantage of Google Cloud, App Engine and Cloud ML. Their solution chooses the most accurate model, choices include a Markov model or most often a deep neural net in TensorFlow.
  - With user behavior changing over time, predictive models require updating (training) from time to time. Training a model involves collecting and transforming data and fitting the parameters of the model accordingly. IIH Nordic use Google Cloud to pull data from a customer’s analytics service into a private data bucket in BigQuery. They process this data, train and test predictive models, updating the prediction service seamlessly.
  - IIH Nordic suggest small/slow sites update their models monthly. Larger sites may need to retrain daily or even hourly for news websites.
  - The benefit of training ML models in the cloud is ease of scale as additional machines, GPUs and processors can be added as needed.
  - Machine Learning-Driven Bundling. The Future of JavaScript Tooling by Minko

Initial priority: Improved Performance through Data-driven Prefetching

The first large priority for Guess.js will be improving web performance through predictive prefetching of content.

By building a model of pages a user is likely to visit, given an arbitrary entry-page, a solution could calculate the likelihood a user will visit a given next page or set of pages and prefetch resources for them while the user is still viewing their current page. This has the possibility of improving page-load performance for subsequent page visits as there's a strong chance a page will already be in the user's cache.

Possible approaches to predictive fetching

In order to predict the next page a user is likely to visit, solutions could use the Google Analytics API. Google Analytics session data can be used to create a model to predict the most likely page a user is going to visit next on a site. The benefit of this session data is that it can evolve over time, so that if particular navigation paths change, the predictions can stay up to date too.

With the availability of this data, an engine could insert <link rel="[prerender/prefetch/preload]"> tags to speed up the load time for the next page request. In some tests, such as Mark Edmondson's [Supercharging Page-Loa

Guess

Install / Use

README