BlackWeb

<table align="center"> <tr> <td align="center"> <span>English</span> | <a href="README-es.md">Español</a> </td> </tr> </table>

BlackWeb is a project that collects and unifies public blocklists of domains (porn, downloads, drugs, malware, spyware, trackers, bots, social networks, warez, weapons, etc.) to make them compatible with Squid-Cache.

DATA SHEET

| ACL | Blocked Domains | File Size | | :---: | :---: | :---: | | blackweb.txt | 4818910 | 120,1 MB |

GIT CLONE

git clone --depth=1 https://github.com/maravento/blackweb.git

HOW TO USE

blackweb.txt is already updated and optimized for Squid-Cache. Download it and unzip it in the path of your preference and activate Squid-Cache RULE.

Download

wget -q -c -N https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.tar.gz && cat blackweb.tar.gz* | tar xzf -

If Multiparts Exist

#!/bin/bash

# Variables
url="https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.tar.gz"
wgetd="wget -q -c --timestamping --no-check-certificate --retry-connrefused --timeout=10 --tries=4 --show-progress"

# TMP folder
output_dir="bwtmp"
mkdir -p "$output_dir"

# Download
if $wgetd "$url"; then
  echo "File downloaded: $(basename $url)"
else
  echo "Main file not found. Searching for multiparts..."

  # Multiparts from a to z
  all_parts_downloaded=true
  for part in {a..z}{a..z}; do
    part_url="${url%.*}.$part"
    if $wgetd "$part_url"; then
      echo "Part downloaded: $(basename $part_url)"
    else
      echo "Part not found: $part"
      all_parts_downloaded=false
      break
    fi
  done

  if $all_parts_downloaded; then
    # Rebuild the original file in the current directory
    cat blackweb.tar.gz.* > blackweb.tar.gz
    echo "Multipart file rebuilt"
  else
    echo "Multipart process cannot be completed"
    exit 1
  fi
fi

# Unzip the file to the output folder
tar -xzf blackweb.tar.gz -C "$output_dir"

echo "Done"

Checksum

wget -q -c -N https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.tar.gz && cat blackweb.tar.gz* | tar xzf -
wget -q -c -N https://raw.githubusercontent.com/maravento/blackweb/master/blackweb.txt.sha256
LOCAL=$(sha256sum blackweb.txt | awk '{print $1}'); REMOTE=$(awk '{print $1}' blackweb.txt.sha256); echo "$LOCAL" && echo "$REMOTE" && [ "$LOCAL" = "$REMOTE" ] && echo OK || echo FAIL

BlackWeb Rule for Squid-Cache

Edit:

/etc/squid/squid.conf

And add the following lines:

# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS

# Block Rule for Blackweb
acl blackweb dstdomain "/path_to/blackweb.txt"
http_access deny blackweb

Advanced Rules

BlackWeb contains millions of domains, therefore it is recommended:

Allow Rule for Domains

Use allowdomains.txt to exclude essential domains or subdomains, such as .accounts.google.com, .yahoo.com, .github.com, etc. According to Squid's documentation, the subdomains accounts.google.com and accounts.youtube.com may be used by Google for authentication within its ecosystem. Blocking them could disrupt access to services like Gmail, Drive, Docs, and others.

acl allowdomains dstdomain "/path_to/allowdomains.txt"
http_access allow allowdomains

Block Rule for Domains

Use blockdomains.txt to block any other domain not included in blackweb.txt

acl blockdomains dstdomain "/path_to/blockdomains.txt"
http_access deny blockdomains

Block Rule for gTLD, sTLD, ccTLD, etc

Use blocktlds.txt to block gTLD, sTLD, ccTLD, etc.

acl blocktlds dstdomain "/path_to/blocktlds.txt"
http_access deny blocktlds

Input:

.bardomain.xxx
.subdomain.bardomain.xxx
.bardomain.ru
.bardomain.adult
.foodomain.com
.foodomain.porn

Output:

.foodomain.com

Block Rule for Punycode

Use this rule to block Punycode - RFC3492, IDN | Non-ASCII (TLDs or Domains), to prevent an IDN homograph attack. For more information visit welivesecurity: Homograph attacks.

acl punycode dstdom_regex -i \.xn--.*
http_access deny punycode

Input:

.bücher.com
.mañana.com
.google.com
.auth.wikimedia.org
.xn--fiqz9s
.xn--p1ai

ASCII Output:

.google.com
.auth.wikimedia.org

Block Rule for Words

Use this rule to block words (Optional. Can generate false positives).

# Download ACL:
sudo wget -P /etc/acl/ https://raw.githubusercontent.com/maravento/vault/refs/heads/master/blackshield/acl/squid/blockwords.txt
# Squid Rule to Block Words:
acl blockwords url_regex -i "/etc/acl/blockwords.txt"
http_access deny blockwords

Input:

.bittorrent.com
https://www.google.com/search?q=torrent
https://www.google.com/search?q=mydomain
https://www.google.com/search?q=porn
.mydomain.com

Output:

https://www.google.com/search?q=mydomain
.mydomain.com

Streaming (Optional)

Use streaming.txt to block streaming domains not included in blackweb.txt (for example: .youtube.com .googlevideo.com, .ytimg.com, etc.).

acl streaming dstdomain "/path_to/streaming.txt"
http_access deny streaming

Note: This list may contain overlapping domains. It is important to manually clean it according to the proposed objective. Example:

If your goal is to block Facebook, keep the primary domains and remove specific subdomains.

If your goal is to block features, like Facebook streaming, keep the specific subdomains and remove the primary domains to avoid impacting overall site access. Example:

# Block Facebook
.fbcdn.net
.facebook.com

# Block some Facebook streaming content
.z-p3-video.flpb1-1.fna.fbcdn.net

Advanced Rules Summary

# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS

# Allow Rule for Domains
acl allowdomains dstdomain "/path_to/allowdomains.txt"
http_access allow allowdomains

# Block Rule for Punycode
acl punycode dstdom_regex -i \.xn--.*
http_access deny punycode

# Block Rule for gTLD, sTLD, ccTLD
acl blocktlds dstdomain "/path_to/blocktlds.txt"
http_access deny blocktlds

# Block Rule for Domains
acl blockdomains dstdomain "/path_to/blockdomains.txt"
http_access deny blockdomains

# Block Rule for Patterns (Optional)
# https://raw.githubusercontent.com/maravento/vault/refs/heads/master/blackshield/acl/squid/blockpatterns.txt
acl blockwords url_regex -i "/path_to/blockpatterns.txt"
http_access deny blockwords

# Block Rule for web3 (Optional)
# https://raw.githubusercontent.com/maravento/vault/refs/heads/master/blackshield/acl/web3/web3domains.txt
acl web3 dstdomain "/path_to/web3domains.txt"
http_access deny web3

# Block Rule for Blackweb
acl blackweb dstdomain "/path_to/blackweb.txt"
http_access deny blackweb

BLACKWEB UPDATE

⚠️ WARNING: BEFORE YOU CONTINUE

This section is only to explain how update and optimization process works. It is not necessary for user to run it. This process can take time and consume a lot of hardware and bandwidth resources, therefore it is recommended to use test equipment.

Bash Update

The update process of blackweb.txt consists of several steps and is executed in sequence by the script bwupdate.sh. The script will request privileges when required.

wget -q -N https://raw.githubusercontent.com/maravento/blackweb/master/bwupdate/bwupdate.sh && chmod +x bwupdate.sh && ./bwupdate.sh

Dependencies

Update requires python 3x and bash 5x. It also requires the following dependencies:

wget git curl libnotify-bin perl tar rar unrar unzip zip gzip python-is-python3 idn2 iconv

Make sure your Squid is installed correctly. If you have any problems, run the following script: (sudo ./squid_install.sh):

#!/bin/bash

# kill old version
while pgrep squid > /dev/null; do
    echo "Waiting for Squid to stop..."
    killall -s SIGTERM squid &>/dev/null
    sleep 5
done

# squid remove (if exist)
apt purge -y squid* &>/dev/null
rm -rf /var/spool/squid* /var/log/squid* /etc/squid* /dev/shm/* &>/dev/null

# squid install (you can use 'squid-openssl' or 'squid')
apt install -y squid-openssl squid-langpack squid-common squidclient squid-purge

# create log
if [ ! -d /var/log/squid ]; then
    mkdir -p /var/log/squid
fi &>/dev/null
if [[ ! -f /var/log/squid/{access,cache,store,deny}.log ]]; then
    touch /var/log/squid/{access,cache,store,deny}.log
fi &>/dev/null

# permissions
chown -R proxy:proxy /var/log/squid

# enable service
systemctl enable squid.service
systemctl start squid.service
echo "Done"

Capture Public Blocklists

Capture domains from downloaded public blocklists (see SOURCES) and unifies them in a single file.

Domains Debugging

Remove overlapping domains ('.sub.example.com' is a subdomain of '.example.com'), does homologation to Squid-Cache format and excludes false positives (google, hotmail, yahoo, etc.) with a allowlist (debugwl.txt).

Input:

com
.com
.domain.com
domain.com
0.0.0.0 domain.com
12

Blackweb

Install / Use

README

BlackWeb

DATA SHEET

GIT CLONE

HOW TO USE

Download

If Multiparts Exist

Checksum

BlackWeb Rule for Squid-Cache

Advanced Rules

Allow Rule for Domains

Block Rule for Domains

Block Rule for gTLD, sTLD, ccTLD, etc

Block Rule for Punycode

Block Rule for Words

Streaming (Optional)

Advanced Rules Summary

BLACKWEB UPDATE

⚠️ WARNING: BEFORE YOU CONTINUE

Bash Update

Dependencies

Capture Public Blocklists

Domains Debugging