SiteScraper
The Site Scraper Tool is an ethical hacking program developed in Python that enables users to clone websites for educational purposes by copying HTML, CSS, JavaScript, and PHP.
Install / Use
/learn @s-r-e-e-r-a-j/SiteScraperREADME
SiteScraper
The Site Scraper Tool is an ethical hacking program developed in Python that enables users to clone websites for educational purposes by copying HTML, CSS, JavaScript, and PHP.
Note: Use this tool responsibly and only on sites where you have explicit permission, as unauthorized scraping can lead to legal issues.
Disclaimer
SiteScraper should only be used on websites you own or have explicit permission to test and analyze. Unauthorized use on external sites without permission may violate laws and terms of service. The author is not responsible for any misuse or legal consequences resulting from the use of this tool.
Compatibility
- Linux (Debian, RHEL, Arch)
Installation
Clone the repository:
git clone https://github.com/s-r-e-e-r-a-j/SiteScraper.git
Navigate to the SiteScraper directory
cd SiteScraper
install Required libraries:
pip3 install -r requirements.txt
Note for Kali, Parrot, Ubuntu 23.04+ users:
If you see an error like:
error: externally-managed-environment
then use:
pip3 install -r requirements.txt --break-system-packages
Navigate to the Site Scraper directory
cd 'Site Scraper'
install the tool:
sudo python3 install.py
Then Enter y for install
Usage
Run SiteScraper from the command line with the following options:
sitescraper <URL> [options]
Command-Line Options
Option Description
<URL> The URL of the website to clone
-d, --depth (Optional) Set the maximum crawl depth (default: 3)
-o, --output (Optional) Set the output directory (default: website_clone) you can also specify path to save example -o /home/kali/Desktop/result
Example
To clone a website up to a depth of 2 and save it in a directory named my_clone, use the following command:
sitescraper https://example.com -d 2 -o /home/kali/Desktop/my_clone
After the cloning process is complete, a directory named after the domain (e.g., http.example.com) will be created inside my_clone.
To view the cloned website, open the index.html file in a browser.
If you see .php files in the directory, it means the website has a PHP backend, and you need to start a PHP server to run it properly.
Starting the PHP Server
- Navigate to the Cloned Website Directory
cd /home/kali/Desktop/my_clone/http.example.com
- Start the PHP Server
Replace yourmachineipaddress with your actual local IP (e.g., 192.168.1.5):
php -S yourmachineipaddress:8080
Example:
php -S 192.168.1.5:8080
- Open the Cloned Website in a Browser
In your web browser, enter:
http://yourmachineipaddress:8080
Example:
http://192.168.1.5:8080
Now, you should be able to access and interact with the cloned website.
How It Works
SiteScraper follows these steps:
-
Initial Crawl: Downloads the main page of the target site. -
Recursive Crawling: Finds all internal links, then recursively crawls and saves them. -
Asset Handling: Downloads and saves linked assets (CSS, JS, images). -
File Structure Preservation: Saves files with the same structure as the original website, maintaining directories and paths.
uninstallation
cd SiteScraper
cd 'Site Scraper'
sudo python3 install.py
Then Enter n for uninstall
License
This project is licensed under the MIT License.
