Webscreenshot
A simple script to screenshot a list of websites
Install / Use
/learn @maaaaz/WebscreenshotREADME
webscreenshot
Description
A simple script to screenshot a list of websites, based on the url-to-image PhantomJS script.
Features
- Integrating url-to-image 'lazy-rendering' for AJAX resources
- Fully functional on Windows and Linux systems
- Cookie and custom HTTP header definition support for the PhantomJS renderer
- Multiprocessing and killing of unresponding processes after a user-definable timeout
- Accepting several formats as input target
- Customizing screenshot size (width, height), format and quality
- Mapping useful options of PhantomJS such as ignoring ssl error, proxy definition and proxy authentication, HTTP Basic Authentication
- Supports multiple renderers:
- PhantomJS, which is legacy and abandoned but the one still producing the best results
- Chromium, Chrome and Edge Chromium, which will replace PhantomJS but currently have some limitations: screenshoting an HTTPS website not having a valid certificate, for instance a self-signed one, will produce an empty screenshot.
The reason is that the--ignore-certificate-errorsoption doesn't work and will never work anymore: the solution is to use a proper webdriver, but to datewebscreenshotdoesn't aim to support this rather complex method requiring some third-party tools. - Firefox can also be used as a renderer but has some serious limitations (so don't use it for the moment):
- Impossibility to perform multiple screenshots at the time: no multi-instance of the firefox process
- No incognito mode, using webscreenshot will pollute your browsing history
- Embedding screenshot URL in image (requires
ImageMagick)
Usage
Put your targets in a text file and pass it with the -i option, or as a positional argument if you have just a single URL.
Screenshots will be available, by default, in your current ./screenshots/ directory.
Accepted input formats are the following:
http(s)://domain_or_ip:port(/resource)
domain_or_ip:port(/resource)
domain_or_ip(/resource)
Options
webscreenshot.py version 2.96
usage: webscreenshot [-h] [-i INPUT_FILE] [-o OUTPUT_DIRECTORY] [-w WORKERS] [-v] [--no-error-file] [-z SINGLE_OUTPUT_FILE] [-p PORT]
[-s] [-m] [-r {phantomjs,chrome,chromium,edge,firefox}] [--renderer-binary RENDERER_BINARY] [--no-xserver]
[--window-size WINDOW_SIZE] [-f {pdf,png,jpg,jpeg,bmp,ppm}] [-q [0-100]] [--ajax-max-timeouts AJAX_MAX_TIMEOUTS]
[--crop CROP] [--custom-js CUSTOM_JS] [-l] [--label-size LABEL_SIZE] [--label-bg-color LABEL_BG_COLOR]
[--imagemagick-binary IMAGEMAGICK_BINARY] [-c COOKIE] [-a HEADER] [-u HTTP_USERNAME] [-b HTTP_PASSWORD]
[-P PROXY] [-A PROXY_AUTH] [-T PROXY_TYPE] [-t TIMEOUT]
[URL]
options:
-h, --help show this help message and exit
Main parameters:
URL Single URL target given as a positional argument
-i, --input-file INPUT_FILE
<INPUT_FILE> text file containing the target list. Ex: list.txt
-o, --output-directory OUTPUT_DIRECTORY
<OUTPUT_DIRECTORY> (optional): screenshots output directory (default './screenshots/')
-w, --workers WORKERS
<WORKERS> (optional): number of parallel execution workers (default 4)
-v, --verbosity <VERBOSITY> (optional): verbosity level, repeat it to increase the level { -v INFO, -vv DEBUG } (default
verbosity ERROR)
--no-error-file <NO_ERROR_FILE> (optional): do not write a file with the list of URL of failed screenshots (default false)
-z, --single-output-file SINGLE_OUTPUT_FILE
<SINGLE_OUTPUT_FILE> (optional): name of a file which will be the single output of all inputs. Ex. test.png
Input processing parameters:
-p, --port PORT <PORT> (optional): use the specified port for each target in the input list. Ex: -p 80
-s, --ssl <SSL> (optional): enforce SSL/TLS for every connection
-m, --multiprotocol <MULTIPROTOCOL> (optional): perform screenshots over HTTP and HTTPS for each target
Screenshot renderer parameters:
-r, --renderer {phantomjs,chrome,chromium,edge,firefox}
<RENDERER> (optional): renderer to use among 'phantomjs' (legacy but best results), 'chrome', 'chromium',
'edge', 'firefox' (version > 57) (default 'phantomjs')
--renderer-binary RENDERER_BINARY
<RENDERER_BINARY> (optional): path to the renderer executable if it cannot be found in $PATH
--no-xserver <NO_X_SERVER> (optional): if you are running without an X server, will use xvfb-run to execute the renderer
(by default, trying to detect if DISPLAY environment variable exists
Screenshot image parameters:
--window-size WINDOW_SIZE
<WINDOW_SIZE> (optional): width and height of the screen capture (default '1200,800')
-f, --format {pdf,png,jpg,jpeg,bmp,ppm}
<FORMAT> (optional, phantomjs only): specify an output image file format, "pdf", "png", "jpg", "jpeg", "bmp"
or "ppm" (default 'png')
-q, --quality [0-100]
<QUALITY> (optional, phantomjs only): specify the output image quality, an integer between 0 and 100 (default
75)
--ajax-max-timeouts AJAX_MAX_TIMEOUTS
<AJAX_MAX_TIMEOUTS> (optional, phantomjs only): per AJAX request, and max URL timeout in milliseconds
(default '1400,1800')
--crop CROP <CROP> (optional, phantomjs only): rectangle <t,l,w,h> to crop the screen capture to (default to WINDOW_SIZE:
'0,0,w,h'), only numbers, w(idth) and h(eight). Ex. "10,20,w,h"
--custom-js CUSTOM_JS
<CUSTOM_JS> (optional, phantomjs only): path of a file containing JavaScript code to be executed before
taking the screenshot. Ex: js.txt
Screenshot label parameters:
-l, --label <LABEL> (optional): for each screenshot, create another one displaying inside the target URL (requires
imagemagick)
--label-size LABEL_SIZE
<LABEL_SIZE> (optional): font size for the label (default 60)
--label-bg-color LABEL_BG_COLOR
<LABEL_BACKGROUND_COLOR> (optional): label imagemagick background color (default NavajoWhite)
--imagemagick-binary IMAGEMAGICK_BINARY
<LABEL_BINARY> (optional): path to the imagemagick binary (magick or convert) if it cannot be found in $PATH
HTTP parameters:
-c, --cookie COOKIE <COOKIE_STRING> (optional): cookie string to add. Ex: -c "JSESSIONID=1234; YOLO=SWAG"
-a, --header HEADER <HEADER> (optional): custom or additional header. Repeat this option for every header. Ex: -a "Host:
localhost" -a "Foo: bar"
-u, --http-username HTTP_USERNAME
<HTTP_USERNAME> (optional): specify a username for HTTP Basic Authentication.
-b, --http-password HTTP_PASSWORD
<HTTP_PASSWORD> (optional): specify a password for HTTP Basic Authentication.
Connection parameters:
-P, --proxy PROXY <PROXY> (optional): specify a proxy. Ex: -P http://proxy.company.com:8080
-A, --proxy-auth PROXY_AUTH
<PROXY_AUTH> (optional): provides authentication information for the proxy. Ex: -A user:password
-T, --proxy-type PROXY_TYPE
<PROXY_TYPE> (optional): specifies the proxy type, "http" (default), "none" (disable completely), or
"socks5". Ex: -T socks
-t, --timeout TIMEOUT
<TIMEOUT> (optional): renderer execution timeout in seconds (default 30 sec)
Examples
list.txt
--------
http://google.fr
https://216.58.213.131
216.58.213.131
https://duckduckgo.com/robots.txt
Default execution with a list
-----------------------------
$ python webscreenshot.py -i list.txt
webscreenshot.py version 2.3
[+] 4 URLs to be screenshot
[+] 4 actual URLs screenshot
[+] 0 error(s)
Default execution with a single URL
-----------------------------------
$ python webscreenshot.py -v google.fr
webscreenshot.py version 2.3
[INFO][General] 'google.fr' has been formatted as 'http://google.fr:80' with supplied overriding options
[+] 1 URLs to be screenshot
[INFO][http://google.fr:80] Screenshot OK
[+] 1 actual URLs screenshot
[+] 0 error(s)
Increasing verbosity level execution
-----------------------------------
$ python webscreenshot.py -i list.txt -v
webscreenshot.py version 2.3
[INFO][General] 'http://google.fr' has been formatted as 'http://google.fr:80' with supplied overriding options
[INFO][General] 'https://216.58.213.131' has been formatted as 'https://216.58.213.131:443' with supplied overriding options
[INFO][General] '216.58.213.131' has been formatted as 'http://216.58.213.131:80' with supplied overriding options
[INFO][General] 'https://duckduckgo.com/robots.txt' has been formatted as 'https://duckduckgo.com:443/robots.txt' with supplied overriding options
[+] 4 URLs to be screenshot
[INFO][https://duckduckgo.com:443/robots.txt] Screenshot OK
[INFO][http://216.58.213.131:80] Screenshot OK
[INFO][https://216.58.213.131:443] Screenshot OK
[INFO][http://google.fr:80] Screenshot OK
[+] 4 actual URLs screenshot
[+] 0 error(s)
Results
-------
$ ls -l screenshots/
total 187
-rwxrwxrwx 1 root root 53805 May 19 16:04 http_216.58.213.131_80.png
-rwxrwxrwx 1 root root 53805 May 19 16:05 http_google.fr_80.png
-rwxrwxrwx 1 root ro
