SkillAgentSearch skills...

Sitediff

SiteDiff makes it easy to see differences between two versions of a website.

Install / Use

/learn @evolvingweb/Sitediff
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

SiteDiff CLI

Warning: SiteDiff 1.2.0 requires at least Ruby 3.1.2.

Warning: SiteDiff 1.0.0 introduces some backwards incompatible changes.

Build Status

Table of contents

Introduction

SiteDiff makes it easy to see how a website changes. It can compare two similar sites or it can show how a single site changed over time. It helps identify undesirable changes to the site's HTML and it's a useful tool for conducting QA on re-deployments, site upgrades, and more!

When you run SiteDiff, it produces an HTML report showing whether pages on your site have changed or not. For pages that have changed, you can see a colorized diff exactly what changed, or compare the visual differences side-by-side in a browser.

SiteDiff supports a range of normalization / sanitization rules. These allow you to eliminate spurious differences, narrowing down differences to the ones that materially affect the site.

Installation

SiteDiff is fairly easy to install. Please refer to the installation docs.

Demo

After installing all dependencies including the bundle version 2 gem, you can quickly see what SiteDiff can do. Simply use the following commands:

git clone https://github.com/evolvingweb/sitediff
cd sitediff
bundle install
bundle exec thor fixture:serve

Then visit http://localhost:13080/ to view the report.

SiteDiff shows you an overview of all the pages and clearly indicates which pages have changed and not changed. page report preview

When you click on a changed page, you see a colorized diff of the page's markup showing exactly what changed on the page. page report preview

Usage

Here are some instructions on getting started with SiteDiff. To see a list of commands that SiteDiff offers, you can run:

sitediff help

To get help for a particular command, say, diff, you can run:

sitediff help diff

Getting started

To use SiteDiff on your site, create a configuration for your site:

sitediff init http://mysite.example.com

SiteDiff will generate a configuration file named sitediff.yaml by default.

You can open the configuration file sitediff/sitediff.yaml to see the default configuration generated by SiteDiff. The the configuration reference section explains the contents of this file and helps you customize it as per your requirements.

Then get SiteDiff to crawl your site by using:

sitediff crawl

SiteDiff will then crawl your site, finding pages and caching their contents. A list of discovered paths will be saved to a paths.txt file.

Now, you can make alterations to your site. For example, change a word on your site's front page. After you're done, you can check what actually changed:

sitediff diff

For each page, SiteDiff will report whether it did or did not change. For pages that changed, it will display a diff. You can also see an HTML version of the report using the following command:

sitediff serve

SiteDiff will start an internal web server and open a report page on your browser. For each page, you can see the diff and a side-by-side view of the old and new versions.

You can now see if the changes were as you expected, or if some things didn't quite work out as you hoped. If you noticed unexpected changes, congratulations: SiteDiff just helped you find an issue you would have otherwise missed!

As you fix any issues, you can continue to alter your site and run sitediff diff to check the changes against the old version. Once you're satisfied with the state of your site, you can inform SiteDiff that it should re-cache your site:

sitediff store

This takes a snapshot of your website and the next time you run sitediff diff, it will use this new version as the reference for comparison.

Happy diffing!

Comparing 2 sites

Sometimes you have two sites that you want to compare, for example a production site hosted on a public server and a development site hosted on your computer. SiteDiff can handle this situation, too! Just inform SiteDiff that there are two sites to compare:

sitediff init http://mysite.example.com http://localhost/mysite

Then when you run sitediff diff, it will compare the cached version of the first site with the current version of the second site.

If both the first and second sites may be changing, you should tell SiteDiff not to cache either site:

sitediff diff --cached=none

Spurious diffs

Sometimes sites have spurious differences, that you don't want to show up in a comparison. For example, many sites protect against Cross-Site Request Forgery using a semi-random token. Since this token changes on each HTTP GET, you probably don't care about such a change.

To help with issues such as this, SiteDiff allows you to normalize the HTML it fetches as it compares pages. In the sitediff.yaml configuration file, you can add "sanitization rules", which specify either DOM transformations or regular expression substitutions.

Here's an example of a rule you might add to remove CSRF-protection tokens generated by Django:

dom_transform:
  - title: Remove CSRF tokens
    type: remove
    selector: input[name=csrfmiddlewaretoken]

You can use one of the presets to apply framework-specific sanitization. Currently, SiteDiff only comes with Drupal-specific presets.

See the preset section for more details.

Command Line Options

Finding configuration files

By default SiteDiff will put everything in the sitediff folder. You can use the --directory flag to specify a different directory.

sitediff init -C my_project_folder https://example.com
sitediff diff -C my_project_folder
sitediff serve -C my_project_folder

Specifying paths

When you run sitediff diff, you can specify which pages to look at in 2 ways:

  1. The option --paths /foo /bar ....

    If you're trying to fix one page in particular, specifying just that one path will make sitediff diff run quickly!

  2. The option --paths-file FILE with a newline-delimited text file.

This is particularly useful when you're trying to eliminate all diffs. SiteDiff creates a file output/failures.txt containing all paths which had differences, so as you try to fix differences, you can run:

sitediff diff --paths-file sitediff/failures.txt

Debugging rules

When a sanitization rule isn't working quite right for you, you might run sitediff diff many times over. If fetching all the pages is taking too long, try adding the option --cached=all. This tells SiteDiff not to re-fetch the content, but just compare previously cached versions — it's a lot faster!

Including and Excluding URLs

By default sitediff crawls pages that are indicated with an HTML anchor using the <A HREF syntax. Most pages linked will be HTML pages, but some links will contain binaries such as PDF documents and images.

Using the option --exclude='.*\.pdf' ensures the crawler skips links for document with a .pdf extension. Note that the regular expression is applied to the path of the URL, not the base of the URL.

For example --include='.*\.com' will not match http://www.google.com/, because the path of that URL is / while the base is www.google.com.

paths / paths-file

SiteDiff allows you to specify a list of paths that you want it to work with. Alternatively, it can crawl the entire site and detect all paths.

  • Running sitediff init configures SiteDiff for crawling and seeing differences.

  • Running

View on GitHub
GitHub Stars242
CategoryDevelopment
Updated3mo ago
Forks49

Languages

HTML

Security Score

97/100

Audited on Dec 30, 2025

No findings