AutoFR

AutoFR generates filter rules for the web to block ads while considering visual breakage automatically.

Generate Convert Improve

Install / Use

/learn @UCI-Networking-Group/AutoFR

About this skill

Quality Score

0/100

README

AutoFR: Automated Filter Rule Generation for Adblocking

We introduce AutoFR, a reinforcement learning (RL) framework to fully automate the process of filter rule creation to block ads and minimize visual breakage optimized per-site. The implementation of the framework is based on the paper, "AutoFR: Automated Filter Rule Generation for Adblocking" (USENIX Security 2023). If you use AutoFR for publication, please cite us.

For more information, see:

AutoFR Dataset

The dataset and its detailed description are available. In summary, the dataset contains 1042 zip files, one per-site. Each zip file includes the raw collected data of outgoing HTTP requests, AdGraphs, annotated site snapshots, the action space, filter rules, and more.

This includes a Top5k_rules.csv file that shows all the filter rules created within each zip file.

Users must sign a consent form (at the bottom of the web page) before accessing the dataset.

How It Works

AutoFR is the first to balance the trade-off between blocking ads vs. avoiding visual breakage. The user gives AutoFR inputs (e.g., the website to generate rules for, and breakage tolerance threshold w) to AutoFR. It will run our RL algorithm based on multi-arm bandits and generate filter rules that block ads while adhering to the given w threshold.

AutoFR Implementation

AutoFR Example Workflow (Fig. 4 of paper): INITIALIZE (a–c, Alg. 1): (a) spawns n=10 docker instances and visits the site until it finishes loading; (b) extracts the outgoing requests from all visits and builds the action space; (c) extracts the raw graph and annotates it to denote visible ads, images, and text, using JS and Selenium. Once all 10 site snapshots are annotated, we run the RL portion of the AutoFR procedure (steps 1–4). Lastly, AutoFR outputs the filter rules at step 5, e.g., ||s.yimg.com/rq/darla/4-10-0/html/r-sf.html.

For more information, see Background Information.

Running AutoFR

Follow the instructions below to run AutoFR. Preview the dependencies.

See Debugging Tips to help solve some common problems.

Setup

We assume you satisfy the hardware and OS dependencies.
Install the core dependencies.

$ sudo apt-get install git python3 python3-dev python3-pip

$ pip3 install virtualenv
1. Install docker using its official instructions.
$ git clone https://github.com/UCI-Networking-Group/AutoFR.git
1. If you are an artifact reviewer, git checkout artifact-review
2. git submodule update --init --recursive
Navigate to the project directory using a terminal window.
Create a virtual environment and activate it. The "[/save-path/autofrenv]" below is a placeholder for your path and name of your environment. As an example, if you have your virtual environments saved under ~/.virtualenvs, then you can do: "~/.virtualenvs/autofrenv"

$ virtualenv --python=python3 [/save-path/autofrenv]

$ source [/save-path/autofrenv]/bin/activate

Install AutoFR dependencies.

$ pip3 install -e .

Build the docker container.

$ docker build -t flg-ad-highlighter-adgraph --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g) -f framework-with-ad-highlighter/DockerAdgraphfile .

Create output directories that AutoFR expects. See Understanding the Output for description.

$ mkdir temp_graphs; mkdir -p data/output/

Done. You are now ready to use AutoFR.

Create Filter Rules

Make sure you have followed the setup instructions.
Open up the AutoFR project directory using a terminal window.
Activate your virtual environment.
Choose a site that has ads with AdChoice transparency logos. We use https://cricbuzz.com as an example here.
Choose how many docker instances you can start in parallel. This depends on the number of cores you have on your system. Pass it using the --chunk_threshold argument. Below, we use 6 as an example. The --init_state_iterations 10 denotes that you require the run to collect 10 site snapshots.
$ python scripts/autofr_controlled.py --site_url "https://cricbuzz.com" --chunk_threshold 6 --init_state_iterations 10
Filter rules will be presented at the end.

Explore other possible inputs you can give scripts/autofr_controlled.py by running:

$ python scripts/autofr_controlled.py --help

(autofrenv) python scripts/autofr_controlled.py --help
usage: autofr_controlled.py [-h] --site_url SITE_URL [--output_directory OUTPUT_DIRECTORY] [--chunk_threshold CHUNK_THRESHOLD]
                            [--gamma GAMMA] [--confidence_ucb CONFIDENCE_UCB] [--w_threshold W_THRESHOLD]
                            [--iteration_threshold ITERATION_THRESHOLD] [--init_state_iterations INIT_STATE_ITERATIONS]
                            [--default_q_value DEFAULT_Q_VALUE] [--reward_func_name {RewardByCasesVer1}]
                            [--bandit_klass_name {DomainHierarchyMABControlled}] [--action_space_klass_name {ActionSpace}]
                            [--log_level LOG_LEVEL]  

We run AutoFR-C.

options:
  -h, --help            show this help message and exit
  --site_url SITE_URL   Site to test
  --output_directory OUTPUT_DIRECTORY
                        output directory for saving agent
  --chunk_threshold CHUNK_THRESHOLD
                        How many times at once we will spawn a browser instance (reduce this number if your machine cannot
                        handle many parallel processes)
  --gamma GAMMA         How much do we care about future rewards. Default is 1/n. If passed in, it will be treated as a float
                        value
  --confidence_ucb CONFIDENCE_UCB
                        Confidence level for UCB calculation
  --w_threshold W_THRESHOLD
                        Preference to avoid visual breakage. Between 0 and 1, use number closer to 1 if you really care about
                        avoiding breakage.
  --iteration_threshold ITERATION_THRESHOLD
                        Multiplier to how many iterations per round
  --init_state_iterations INIT_STATE_ITERATIONS
                        Number of site snapshots required for AutoFR to run (reduce this number if the process cannot detect ads
                        easily for the website)
  --default_q_value DEFAULT_Q_VALUE
                        whether we do initializing only. New filter rules will be outputted
  --reward_func_name {RewardByCasesVer1}
                        Name of reward function
  --bandit_klass_name {DomainHierarchyMABControlled} 
                        Name of bandit control class 
  --action_space_klass_name {ActionSpace}
                        Name of action space class   
  --log_level LOG_LEVEL
                        Log level

Debugging Tips

Thank you for trying AutoFR, here I provide some debugging tips.

InvalidSiteFeedbackException: Expected 10 init states but got X

If you get exceptions such as InvalidSiteFeedbackException: Expected 10 init states but got 0, this means that you are not able to collect valid site snapshots of a website. A valid site snapshot requires that ads are detected (counter of ads > 0).

A. Reduce Required Site Snapshots: There are a few things you can do. If the exception says that you are able to collect some site snapshots like Expected 10 init states but got 4, then you may want to reduce the number of required site snapshots by passing in a different value for --init_state_iterations

$ python scripts/autofr_controlled.py --site_url "https://cricbuzz.com" --chunk_threshold 6 --init_state_iterations 4

B. Reduce Parallel Processes:

It could be possible that your machine cannot handle the number of parallel processes. Try to reduce the number by changing --chunk_threshold

$ python scripts/autofr_controlled.py --site_url "https://cricbuzz.com" --chunk_threshold 2 --init_state_iterations 4

C. Can Ad Highlighter Detect Ads for the Given Site?

It could simply mean that Ad Highlighter cannot detect any ads for the given website. This could be due to many reasons, such as the website changing over time and no longer serving ads that can be detected by Ad Highlighter, or it could be detecting that you are using a crawler (AutoFR uses Selenium) and purposefully not serving ads.

To verify whether Ad Highlighter can detect ads for the given website, go to the web extension and download its code and install it manually on Chrome.

Download the code above.
Open Chrome > Manage Extensions > Load unpacked
Choose the perceptual-adblocker directory (this is Ad Highlighter). This should install the web extension.
Remember to disable any adblocker that active for your browser
Visit the website that you are testing with AutoFR and see if Ad Highlighter will identify those ads (there should be an red-ish overlay on those ads)
Once you are done verifying, make sure to disable Ad Highlighter and re-active any adblockers before.

If Ad Highlighter does not identify any ads for the given website, this means that you need to updat

Related Skills

node-connect

350.8k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

110.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

350.8k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

350.8k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。