Webgenerator
An open-source software for synthetic web-based user interface and content dataset generation.
Install / Use
/learn @agsoto/WebgeneratorREADME
WebGenerator
Generate easily probabilistic dataset of web interfaces and content. The datasetter allows you to generate HTML files, their corresponding screenshots and a JSON file with the labeled HTML elements. This way you can train supervised and non-supervised models. You can also set probabilities and options generation of the batch to suit your needs.
<img src="https://imgur.com/a6o62Da.png" alt="Example 3" width="250">This development is kindly supported by the awesome SDAS Group.
Some selected examples
<img src="https://i.imgur.com/rlsanuU.png" alt="Example 1" width="300"> <img src="https://i.imgur.com/GnxOmgp.png" alt="Example 2" width="300"> <img src="https://i.imgur.com/vELUSQZ.png" alt="Example 3" width="300">A full dataset of 1000 elements with 800x600 size generated with the tool can be shown here and can be downloaded here. In this dataset you will find a folder with CSS, js, HTML files, image folders and JSON files. The html directory has html files rw prefix with the name (rw_0.html, row_1.html,.., row_n.html). Inside the CSS folder, the Bootstrap distribution file with the web page's color palette and another file with the necessary CSS rules for the sidebar and extra required styling. The js folder contains the needed JQuery and Bootstraps Javascript files.
Requirements
- Python >= 3.7 (download here)
- Pip >= 20.0.2 (installation instructions here)
- Chrome / Chromium browser
- Chrome driver
Browser and driver
The chrome driver allows Web Generator manage instances of the browser to take the screenshots and create tags annotations of the inner html elements.
- If you have a Chrome or Chromium browser installed you can skip this step. Otherwise you can download either a setup or a zip file with the software. In this case we recommend downloading Chromium from this builds website. You should select "Archive" (Zip folder) or Installer.
- Next you have to download the Chrome Driver from here. Make sure
you have SAME VERSIONS for the driver and the browser. Once downloaded the driver, extract and put the file in your browser's executable folder. If you installed Chrome the path could be
C:/Program Files/Google/Chrome/Application.
You can always check the official documentation of Selenium
Installation
Simply git clone this repository or download the zip folder:
git clone https://github.com/agsoto/webgenerator.git
cd webgenerator
Then install the dependencies
pip install -r requirements.txt
Since screen capturing feature depends on Selenium Driver, you should add the path to the system's enviroment variables. Look how to set your enviroment variables on Windows and Mac. Or if your'e using linux you can create a symbolic link:
ln -s path-to-executable-driver chromedriver.
However if you don't want to add an eviroment variable, when using the class ScreenShutter, you can set the path to the driver this way:
ScreenShutter(driver_path="path-to-executable-driver")
This optional parameter could be set as it appears in line 18 of Main.py file.
Execution
There's a code example of the use of the generator in the Main.py file. Once you're all set just run:
python ./Main
Potential Applications
This dataset has a potential applications for will generate GUI web, here you will find three deep learning models examples.
- GAN: To generate GUI web images through web generator images.
- Fast RCNN: To detect components in web page's images.
- Pix2Pix: To generate GUI web images through images's edges (canny mask).
GAN
<img src='https://raw.githubusercontent.com/agsoto/webgenerator/master/PotentialApplications/Images/gan.png' />Faster RCNN
<img src='https://raw.githubusercontent.com/agsoto/webgenerator/master/PotentialApplications/Images/frcnn.png' width=60%/>Pix2Pix
<img src='https://raw.githubusercontent.com/agsoto/webgenerator/master/PotentialApplications/Images/p2p.png' width=60%/>Generation Probabilities
The parameters for the WebLayoutProbabilities object (that is used for the generation), are described below. | Param # | Name | Type | Description | |---------|--------------------|---------|-----------------------------------------------------------------------------------------------------------------------------------------| | 1 | with_sidebar_p | float | Probability that the Sidebar is present | | 2 | with_header_p | float | Probability that the Header is present | | 3 | with_navbar_p | float | Probability that the Navbar is present | | 4 | with_footer_p | float | Probability that the Footer is present | | 5 | layouts_p | list[4] | List with the probabilities for each possible layout. The sum of the probabilities should be 1 | | 6 | boxed_body_p | float | Probability that the page's Body is boxed inside a container | | 7 | big_header_p | float | Probability of having a big header (A big header is considered 50% or more of the screen height) | | 8 | sidebar_first_p | float | Probability of the Sidebar being at the left side of the Body | | 9 | navbar_first_p | float | Probability of the Navbar being above the header | | 10 | bg_color_classes_p | list[3] | List with the probabilities for the combination of CSS Bootstrap's background color classes. The sum of the probabilities should be 1 |
Related Skills
qqbot-channel
350.1kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
claude-opus-4-5-migration
109.9kMigrate prompts and code from Claude Sonnet 4.0, Sonnet 4.5, or Opus 4.1 to Opus 4.5
docs-writer
100.4k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
350.1kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
