Webgenerator

An open-source software for synthetic web-based user interface and content dataset generation.

Generate Convert Improve

Install / Use

/learn @agsoto/Webgenerator

About this skill

Quality Score

0/100

README

WebGenerator

Generate easily probabilistic dataset of web interfaces and content. The datasetter allows you to generate HTML files, their corresponding screenshots and a JSON file with the labeled HTML elements. This way you can train supervised and non-supervised models. You can also set probabilities and options generation of the batch to suit your needs.

This development is kindly supported by the awesome SDAS Group.

Some selected examples

A full dataset of 1000 elements with 800x600 size generated with the tool can be shown here and can be downloaded here. In this dataset you will find a folder with CSS, js, HTML files, image folders and JSON files. The html directory has html files rw prefix with the name (rw_0.html, row_1.html,.., row_n.html). Inside the CSS folder, the Bootstrap distribution file with the web page's color palette and another file with the necessary CSS rules for the sidebar and extra required styling. The js folder contains the needed JQuery and Bootstraps Javascript files.

Requirements

Python >= 3.7 (download here)
Pip >= 20.0.2 (installation instructions here)
Chrome / Chromium browser
Chrome driver

Browser and driver

The chrome driver allows Web Generator manage instances of the browser to take the screenshots and create tags annotations of the inner html elements.

If you have a Chrome or Chromium browser installed you can skip this step. Otherwise you can download either a setup or a zip file with the software. In this case we recommend downloading Chromium from this builds website. You should select "Archive" (Zip folder) or Installer.
Next you have to download the Chrome Driver from here. Make sure you have SAME VERSIONS for the driver and the browser. Once downloaded the driver, extract and put the file in your browser's executable folder. If you installed Chrome the path could be C:/Program Files/Google/Chrome/Application.

You can always check the official documentation of Selenium

Installation

Simply git clone this repository or download the zip folder:

git clone https://github.com/agsoto/webgenerator.git
cd webgenerator

Then install the dependencies

pip install -r requirements.txt

Since screen capturing feature depends on Selenium Driver, you should add the path to the system's enviroment variables. Look how to set your enviroment variables on Windows and Mac. Or if your'e using linux you can create a symbolic link: ln -s path-to-executable-driver chromedriver.

However if you don't want to add an eviroment variable, when using the class ScreenShutter, you can set the path to the driver this way:

ScreenShutter(driver_path="path-to-executable-driver")

This optional parameter could be set as it appears in line 18 of Main.py file.

Execution

There's a code example of the use of the generator in the Main.py file. Once you're all set just run:

python ./Main

Potential Applications

This dataset has a potential applications for will generate GUI web, here you will find three deep learning models examples.

GAN: To generate GUI web images through web generator images.
Fast RCNN: To detect components in web page's images.
Pix2Pix: To generate GUI web images through images's edges (canny mask).

GAN

Faster RCNN

Pix2Pix

Generation Probabilities

The parameters for the WebLayoutProbabilities object (that is used for the generation), are described below. | Param # | Name | Type | Description | |---------|--------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------| | 1 | with_sidebar_p | float | Probability that the Sidebar is present | | 2 | with_header_p | float | Probability that the Header is present | | 3 | with_navbar_p | float | Probability that the Navbar is present | | 4 | with_footer_p | float | Probability that the Footer is present | | 5 | layouts_p | list[4] | List with the probabilities for each possible layout. The sum of the probabilities should be 1 | | 6 | boxed_body_p | float | Probability that the page's Body is boxed inside a container | | 7 | big_header_p | float | Probability of having a big header (A big header is considered 50% or more of the screen height) | | 8 | sidebar_first_p | float | Probability of the Sidebar being at the left side of the Body | | 9 | navbar_first_p | float | Probability of the Navbar being above the header | | 10 | bg_color_classes_p | list[3] | List with the probabilities for the combination of CSS Bootstrap's background color classes. The sum of the probabilities should be 1 |

Related Skills

qqbot-channel

350.1k

QQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口，自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。

claude-opus-4-5-migration

109.9k

Migrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5

docs-writer

100.4k

`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie

model-usage

350.1k

Use CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.

agsoto

View profile

View on GitHub

GitHub Stars17

CategoryContent

Updated8mo ago

Forks7

agsoto/webgenerator

Languages

HTML

Security Score

87/100

Audited on Aug 4, 2025

No findings