Screen2Vec

Screen2Vec is a new self-supervised technique for generating more comprehensive semantic embeddings of GUI screens and components using their textual content, visual design and layout patterns, and app meta-data.

Generate Convert Improve

Install / Use

/learn @tobyli/Screen2Vec

About this skill

Quality Score

0/100

README

Screen2Vec

Screen2Vec is a new self-supervised technique for generating more comprehensive semantic embeddings of GUI screensand components using their textual content, visual designand layout patterns, and app meta-data. Learn more about Screen2Vec in our CHI 2021 paper.

This repository houses the files necessary to implement the Screen2Vec vector embedding process on screens from the RICO dataset.

Steps to run full pipeline

Extract vocab from the full RICO traces dataset and pre-write its Sentence-BERT embeddings (UI_embedding/extract_vocab.py).
Train UI embedding model (UI_embedding/main.py) on RICO traces dataset.
Use UI_embedding/modeltester.py to test the performance.
Use UI_embedding/precompute_embeddings to write files of the UI components and app descriptions and their respective embeddings; do this for both the test and train components of the dataset.
Train the layout autoencoder model using the RICO screens dataset using layout.py and then precompute the layout embeddings for your dataset using write_layout.py.
Train the Screen2Vec embedding model using main_preloaded.py.
Use modeltester.py to test it out.

In-depth

Setup

The code was developed in the following environment:

Python 3.6.1
Pytorch 1.5.0

To install dependencies:

Python dependencies
```
pip install -r requirements.txt
```

Data

Due to its large size, the data is hosted outside Github: the data is stored at http://interactionmining.org/rico in the "Interaction Traces" and "UI Screenshots and View Hierarchies" datasets. To easily download it, run

./download_data.sh

The pretrained models are stored at http://basalt.amulet.cs.cmu.edu/screen2vec/. Download them by running

./download_models.sh

The model labelled "UI2Vec" is the GUI element embedding model, "Screen2Vec" is the screen embedding model, "layout_encoder" is the screen layout autoencoder, and "visual_encoder" is our visual autoencoder baseline.

Quick start

If you just want to embed a screen using our pretrained GUI element, layout, and screen embedding models, run:

python get_embedding.py -s <path-to-screen> -u "UI2Vec_model.ep120" -m "Screen2Vec_model_v4.ep120" -l "layout_encoder.ep800"

This generates the vector embedding using our pretrained models. The parameters are:

-s/--screen, the path to the screen to encode
-u/--ui_model, the path to the ui embedding model to use
-m/--screen_model, the path to the screen embedding model to use
-l/--layout_model, the path to the layout embedding model to use

Training

There are 2-3 steps for training your own screen embedding- first training the GUI element model, optionally training the screen layout autoencoder, and then the screen level model

GUI model

Files for the GUI model are contained within the UI_embedding directory. To train a model, run from within that directory:

python extract_vocab.py -d <location-of-dataset>

python main.py -d <location-of-dataset> -o <desired-output-path> -b 256 -e 100 -v "vocab.json" -m "vocab_emb.npy" -n 16 -r 0.001 -l "cel"

The parameters here are:

-d/--dataset, the path to the RICO dataset traces
-o/--output, the path prefix for where the output models should be stored
-b/--batch, number of traces in a batch
-e/--epochs, desired number of epochs
-v/--vocab_path, the path to where the vocab was precomputed
-m/--embedding_path, path to where the vocab BERT embeddings were precomputed
-n/--num_predictors, the number of UI elements used to predict the unknown element
-r/--rate, the training rate
-hi/--hierarchy, flag to use if desiring to use the hierarchy distance metric rather than euclidean
-l/--loss, the desired loss metric; "cel" for cross-entropy loss, or "cossim" for cosine similarity

Then, to pre-generate these embeddings for your dataset to then use in screen training, run:

python precompute_embeddings.py -d <location-of-dataset> -m <desired-ui-model> -p <desired-prefix>

Then, move the files generated here from the UI_embedding directory into the Screen2Vec directory.

Layout autoencoder

The autoencoder is trained from within the Screen2Vec directory, by running:

python layout.py -d <location-of-screen-dataset> -b 256 -e 400 -r 0.001

where -b flags the batch size, -e the number of epochs, and -r the learning rate. Here, use the screen dataset ("combined") rather than the trace dataset ("filtered_traces").

Then, to pre-generate these embeddings for your dataset to then use in screen training, run

python write_layout.py -d <location-of-dataset> -m <layout-model> -p <same-desired-prefix>

Here, make sure you use the same desired prefix as from precomputing the UI embeddings.

Screen model

To train the Screen level model, run

python main_preloaded.py -d <previously-chosen-prefix> -s 128 -b 256 -t <ui-output-prefix>

The parameters here are:

-d/--data, the prefix selected on the precompute embeddings stage
-s/--neg_samp, the number of screens to use as a negative sample during training
-b/--batch, number of traces in a batch
-e/--epochs, desired number of epochs
-n/--num_predictors, the number of screens used to predict the next screen in the trace (we used 4)
-r/--rate, the training rate
-t/--test_train_split, the output path prefix from the UI embedding model, which was used to store the data split information as well

Evaluation

There are files to evaluate the performance of both the GUI and Screen embeddings

To test the prediction accuracy of the GUI embedding, run (from within the UI_embedding directory)

python modeltester.py -m "UI2Vec_model.ep120" -v "vocab.py" -ve "vocab_emb.npy" -d <path-to-dataset>

-m/--model, path to the pretrained UI embedding model
-d/--dataset, the path to the RICO dataset traces
-v/--vocab_path, the path to where the vocab was precomputed
-ve/--vocab_embedding_path, path to where the vocab BERT embeddings were precomputed

To test the prediction accuracy of the Screen embedding, run (from within the main directory)

python modeltester_screen.py -m "Screen2Vec_model_v4.ep120" -v 4 -n 4

where -m flags the model to test, -v the model version (4 is standard), and -n the number of predictors used in predictions

Other files

The following files were used in the testing infrastructure of our baseline models. These are not needed for general use in the Screen2Vec pipeline and therefore have been stored in the sub-directory baseline to avoid clutter/confusion. However, if you desire to run these scripts, they should be moved to the main directory:

for_baselines.py
modeltester_baseline.py
write_baseline_models.py
write_baseline_models_for_prediction.py

Reference

Toby Jia-Jun Li*, Lindsay Popowski*, Tom M. Mitchell, and Brad A. Myers. Screen2Vec: Semantic Embedding of GUI Screens and GUI Components. Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2021).

Related Skills

clearshot

Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.

openpencil

2.1k

The world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.

openpencil

2.1k

HappyColorBlend

HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to

tobyli

View profile

View on GitHub

GitHub Stars81

CategoryDesign

Updated2mo ago

Forks12

tobyli/Screen2Vec

Languages

Python

Security Score

80/100

Audited on Jan 8, 2026

No findings