SkillAgentSearch skills...

Dataset

Web Fuzzing Dataset (WFD): a set of web/enterprise applications for experimentation in automated system testing

Install / Use

/learn @WebFuzzing/Dataset
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

WFD

DOI

Web Fuzzing Dataset (WFD): a set of web/enterprise applications for scientific research in Software Engineering.

We collected several different systems running on the JVM, in different programming languages such as Java and Kotlin. In this documentation, we will refer to these projects as System Under Test (SUT). Currently, the SUTs are either REST, GraphQL or RPC APIs.

This dataset was previously known as EMB. It was rebranded into WFD since version 4.0.0.

This collection of SUTs was originally assembled for easing experimentation with the fuzzer called EvoMaster. However, finding this type of application is not trivial among open-source projects. Furthermore, it is not simple to sort out all the technical details on how to set these applications up and start them in a simple, uniform approach. Therefore, this repository provides the important contribution of providing all these necessary scripts for researchers that need this kind of case study.

Black-box Testing. For each SUT, we provide Docker Compose scripts (under the dockerfiles folder) to start the APIs with all their needed dependencies (e.g., databases). APIs are configured with mitmproxy and JaCoCo to collect information on the fuzzing results.

White-box Testing. For each SUT, we implemented driver classes for EvoMaster (currently the only existing white-box fuzzer for the JVM), which can programmatically start, stop and reset the state of SUT (e.g., data in SQL databases). As well as enable setting up different properties in a uniform way, like choosing TCP port numbers for the HTTP servers. If a SUT uses any external services (e.g., a SQL database), these will be automatically started via Docker in these driver classes.

NOTE: version 1.6.1 was last one in which we still updated drivers for JavaScript and C#. Those SUTs are not built anymore by default, and latest versions of EvoMaster will not work on those old drivers. Updating drivers for different programming languages (and re-implement white-box heuristics) is a massive amount of work, which unfortunately has little to no value for the scientific community (based on our experience). Those SUTs are still here in WFD to be able to replicate old experiments, but unfortunately not for white-box testing with latest versions of EvoMaster.

An old video (2023) providing some high level overview of EMB can be found here.

EMB YouTube Video

License

All the code that is new for this repository (e.g., the driver classes) is released under Apache 2.0 license. However, this repository contains as well sources from different open-source projects, each one with its own license, as clarified in more details beneath.

Example

To see an example of using these drivers with EvoMaster to generate test cases, you can look at this short video (5 minutes).

Citation

If you are using WFD in an academic work, you can cite the following:

O. Sahin, M. Zhang, A. Arcuri. WFC/WFD: Web Fuzzing Commons, Dataset and Guidelines to Support Experimentation in REST API Fuzzing. arxiv 2509.01612

For the old version still called EMB, you can refer to:

A. Arcuri, M. Zhang, A. Golmohammadi, A. Belhadi, J. P. Galeotti, B. Marculescu, S. Seran. EMB: A Curated Corpus of Web/Enterprise Applications And Library Support for Software Testing Research. In IEEE International Conference on Software Testing, Validation and Verification (ICST), 2023.

Current Case Studies

The projects were selected based on searches using keywords on GitHub APIs, using convenience sampling. Several SUTs were looked at, in which we discarded the ones that would not compile, would crash at startup, would use obscure/unpopular libraries with no documentation to get them started, are too trivial, student projects, etc. Where possible, we tried to prioritize/sort based on number of stars on GitHub. When authors of other fuzzers used some other open-source JVM APIs in their studies, we included them here into WFD.

Note that some of these open-source projects might be no longer supported, whereas others are still developed and updated. Once a system is added to WFD, we do not modify nor keep it updated with its current version under development. The reason is that we want to keep an easy to use, constant set of case studies for experimentation that can be reliably used throughout the years.

The SUTs called NCS (Numerical Case Study) and SCS (String Case study) are artificial, developed by us. They are based on numerical and string-based functions previously used in the literature of unit test generation. We just re-implemented in different languages, and put them behind a web service.

For the RESTful APIs, each API has an endpoint where the OpenAPI/Swagger schemas can be downloaded from. For simplicity, all schemas are also available as JSON/YML files under the folder openapi-swagger.

IMPORTANT: More details (e.g., #LOCs and used databases) on these APIs can be found in this table.

Real-world APIs require authentication. How to setup authentication information, based on the current content of the initialized databases, is expressed in Web Fuzzing Commons (WFC) format. Auth configuration files can found in the auth folder.

REST: Java/Kotlin (36)

Related Skills

View on GitHub
GitHub Stars45
CategoryDevelopment
Updated12d ago
Forks23

Languages

Java

Security Score

95/100

Audited on Mar 21, 2026

No findings