RunWright

If you want to skip reading the context and go directly to the getting started section, click here.

🚀 Core features

Time-based completion: Finish thousands of tests in your target timeframe (2-5 minutes).
Dynamic auto scaling: Auto-adjust runners based on test load.
Smart distribution: Balance workload by execution time, not test count.

In the below example, we see more than three thousand tests run in just 1.5 minutes with a desired run time of 2 minutes in total.

Scope

This action covers both execution modes in Playwright:

When fullyParallel=true - Parallel run of all individual test cases on runners
When fullyParallel=false - Parallel run of all individual test files on runners

Why this action? What's wrong with Playwright Sharding?

We will explain this by looking into the details of how Playwright Sharding works and what problems it brings with its implementation.

With Playwright Sharding

Playwright Sharding is an out-of-the-box solution from Playwright to allow distributed runs on any machine. Its inner workings and GitHub usage examples have two main flaws.

1. Playwright sharding results in uneven test distribution on runners.

Playwright sharding distributes tests based on the total count of tests (balancing shards) and not based on how much time each test takes to complete. Since sharding is not time aware of every test while distributing tests on runners, it results in situations as below.

uneven distribution

2. Fixed runners that do not scale up or down based on test load.

Playwright gives a GitHub actions example that shows how we can use a GitHub matrix strategy to distribute tests on a fixed number of runners (4 in the given example). This results in inefficiencies as shown below.

fixed runners

With RunWright

1. Even test load distribution, based on your pre-decided total run time, to finish tests.

2. Dynamic runners that scale up or down based on test load.

Pros and Cons: Playwright Sharding vs RunWright

| Aspect | 🚫 Playwright Sharding | ✅ RunWright | |--------|------------------------|-------------| | Timely feedback on pull requests | 🐌 Test runs that take a long time cannot be run with every pull request. | ⚡️ Fast and predictable run times make it possible to run system tests with every PR. | | Trust in Tests | 📉 Tests that aren't run with each PR don't get fixed with each PR. They are often run after new changes are already merged into the main branch. As seen frequently, such tests give false positives due to new changes and break the team's trust in them. | 📈 Tests that are run with every PR get fixed with the PRs. They provide timely feedback to developers, give true positives, and improve the team's trust in them. | | Maintenance Fatigue | 😩 Tests that are not fixed with PRs get passed on to QAs. When this happens frequently, which it often does, it results in maintenance fatigue in QAs. QAs find themselves demotivated and stuck in this never-ending cycle of fixing broken tests, with little to no time to do anything else that is meaningful. | 😇 When developers are responsible for fixing the tests that are broken due to their own changes, it frees up time for testers in the team to do more meaningful work such as exploratory testing, writing new tests for missing functionality, learning new ways of testing, and mentoring team members on testing and automation. | | To increase Test Coverage or not? | 📉 Increased test run times create pressure on the team to limit test suite growth and over-optimize existing tests rather than adding new tests to increase test coverage. | 📈 When teams have a solution and setup that can always finish tests in a fixed time (say 2 to 5 minutes), it encourages them to write new tests to increase test coverage for missing functionality and not worry about over-optimization to keep run times in check. | | Runner Scaling Efficiency | 📉 Adding more runners has diminishing returns and doesn't guarantee proportional time savings | 💡 Smart auto-scaling based on test load gives consistent and directly proportional performance benefits | | Costs and returns | 💸 As we have seen, with more added runners, the infrastructure costs grow in proportion but with diminishing performance results. | 💰 Infrastructure costs are always in proportion to our test run demands, and we only pay for what we use. Nothing more. Nothing less. | | Scalability Potential | 🔒 Approach doesn't scale well with an increased number of tests. | 🚀 Excellent scalability that grows efficiently with test suite expansion, always keeping total run time fixed to our desired times (say 2 to 5 minutes regardless of total tests to run) |

Key Takeaway: RunWright transforms system testing from a burden into an enabler, allowing teams to maintain fast feedback loops while scaling their test suites confidently.

** At the time of writing this document, there are no known other solutions (paid or open source) that can do this using Playwright and GitHub.

💡 So how does it work?

To build a solution that is "time aware" and that can "auto-scale" based on the "current test load," there are a few things that we need.

🔁 i.e.:

Σ T_i = TestRunTimeForEachTest(i) = execution time of test i (from state.json)
- We get this value from the state.json file that is generated using a custom state-reporter.js file and committed on a post-commit hook.
N = total number of tests to run.
- We get the test scope by running the playwright command with the --list option.
TargetRunTime = total desired time to complete the run (in minutes)
- We get this as input from the user.
TotalLoad = Σ T_i = total test load (in terms of test run time)
- We iterate over each runner to keep the Σ T_i <= TargetRunTime.
- Note that the total run time for each runner is affected by the number of parallel threads and is explained in more detail in the next section.
Cores = number of cores per runner.
- Default cores on GitHub linux public runners is 4.
- Default cores on GitHub linux private runners is 2.
- For enterprise projects, it is possible to request for custom powerful larger runners that have higher cores.
- For Linux runners, the action can calculate the cores at run time with this command: NUM_CORES=$(nproc)
Threads (Parallel threads per runner).
- Recommended Threads per runner is half of cores; i.e. (Threads = Cores / 2).
Runners = Total number of required runners.
- We calculate the optimal required runners as shown in the next section by using all the above available information.
- Providing runners as a GitHub dynamic matrix.
  - GitHub fromJSON and GITHUB_OUTPUT variables makes it possible to pass dynamic matrix from one job to another.
  - Note: It is good to note that it is not straightforward to pass the matrix variables using other variable options (such as setting as environment variables or taking from user as workflow input variables). Because of this reason, creating dynamic matrix remains a bit of a mystery and thats why most teams end up using hardcoded matrix in their workflows.
- Pro Tip: Users can Limit the maximum number of runners to a sensible limit (say 20) in their caller workflow to avoid spinning up hundreds of runners, in case of a huge test set with very long tests and short total run time wishes. Also good to note that this variable cannot be passed as input variable. So it must be hardcoded and fixed in the workflow files.

📐 Equation

Since we know every individual TestRunTimeForEachTest(i) from state.json, the total workload is:

alt text

Total parallel capacity available on the runners:

alt text

Equating Load and Capacity:

alt text

Solving for Runners:

alt text

Finally, we piece all this information together in this custom [runwright

Runwright

Install / Use

README

RunWright

🚀 Core features

Scope

Why this action? What's wrong with Playwright Sharding?

With Playwright Sharding

1. Playwright sharding results in uneven test distribution on runners.

2. Fixed runners that do not scale up or down based on test load.

With RunWright

1. Even test load distribution, based on your pre-decided total run time, to finish tests.

2. Dynamic runners that scale up or down based on test load.

Pros and Cons: Playwright Sharding vs RunWright

💡 So how does it work?