D3DShot
Extremely fast and robust screen capture on Windows with the Desktop Duplication API
Install / Use
/learn @SerpentAI/D3DShotREADME
D3DShot
D3DShot is a pure Python implementation of the Windows Desktop Duplication API. It leverages DXGI and Direct3D system libraries to enable extremely fast and robust screen capture functionality for your Python scripts and applications on Windows.
D3DShot:
- Is by far the fastest way to capture the screen with Python on Windows 8.1+
- Is very easy to use. If you can remember 10-ish methods, you know the entire thing.
- Covers all common scenarios and use cases:
- Screenshot to memory
- Screenshot to disk
- Screenshot to memory buffer every X seconds (threaded; non-blocking)
- Screenshot to disk every X seconds (threaded; non-blocking)
- High-speed capture to memory buffer (threaded; non-blocking)
- Captures to PIL Images out of the box. Gracefully adds output options if NumPy or PyTorch can be found.
- Detects displays in just about any configuration: Single monitor, multiple monitors on one adapter, multiple monitors on multiple adapters.
- Handles display rotation and scaling for you
- Supports capturing specific regions of the screen
- Is robust and very stable. You can run it for hours / days without performance degradation
- Is even able to capture DirectX 11 / 12 exclusive fullscreen applications and games!
TL;DR Quick Code Samples
Screenshot to Memory
import d3dshot
d = d3dshot.create()
d.screenshot()
Out[1]: <PIL.Image.Image image mode=RGB size=2560x1440 at 0x1AA7ECB5C88>
Screenshot to Disk
import d3dshot
d = d3dshot.create()
d.screenshot_to_disk()
Out[1]: './1554298682.5632973.png'
Screen Capture for 5 Seconds and Grab the Latest Frame
import d3dshot
import time
d = d3dshot.create()
d.capture()
time.sleep(5) # Capture is non-blocking so we wait explicitely
d.stop()
d.get_latest_frame()
Out[1]: <PIL.Image.Image image mode=RGB size=2560x1440 at 0x1AA044BCF60>
Screen Capture the Second Monitor as NumPy Arrays for 3 Seconds and Grab the 4 Latest Frames as a Stack
import d3dshot
import time
d = d3dshot.create(capture_output="numpy")
d.display = d.displays[1]
d.capture()
time.sleep(3) # Capture is non-blocking so we wait explicitely
d.stop()
frame_stack = d.get_frame_stack((0, 1, 2, 3), stack_dimension="last")
frame_stack.shape
Out[1]: (1080, 1920, 3, 4)
This is barely scratching the surface... Keep reading!
Requirements
- Windows 8.1+ (64-bit)
- Python 3.6+ (64-bit)
Installation
pip install d3dshot
D3DShot leverages DLLs that are already available on your system so the dependencies are very light. Namely:
- comtypes: Internal use. To preserve developer sanity while working with COM interfaces.
- Pillow: Default Capture Output. Also used to save to disk as PNG and JPG.
These dependencies will automatically be installed alongside D3DShot; No need to worry about them!
Extra Step: Laptop Users
Windows has a quirk when using Desktop Duplication on hybrid-GPU systems. Please see the wiki article before attempting to use D3DShot on your system.
Concepts
Capture Outputs
The desired Capture Output is defined when creating a D3DShot instance. It defines the type of all captured images. By default, all captures will return PIL.Image objects. This is a good option if you mostly intend to take screenshots.
# Captures will be PIL.Image in RGB mode
d = d3dshot.create()
d = d3dshot.create(capture_output="pil")
D3DShot is however quite flexible! As your environment meets certain optional sets of requirements, more options become available.
If NumPy is available
# Captures will be np.ndarray of dtype uint8 with values in range (0, 255)
d = d3dshot.create(capture_output="numpy")
# Captures will be np.ndarray of dtype float64 with normalized values in range (0.0, 1.0)
d = d3dshot.create(capture_output="numpy_float")
If NumPy and PyTorch are available
# Captures will be torch.Tensor of dtype uint8 with values in range (0, 255)
d = d3dshot.create(capture_output="pytorch")
# Captures will be torch.Tensor of dtype float64 with normalized values in range (0.0, 1.0)
d = d3dshot.create(capture_output="pytorch_float")
If NumPy and PyTorch are available + CUDA is installed and torch.cuda.is_available()
# Captures will be torch.Tensor of dtype uint8 with values in range (0, 255) on device cuda:0
d = d3dshot.create(capture_output="pytorch_gpu")
# Captures will be torch.Tensor of dtype float64 with normalized values in range (0.0, 1.0) on device cuda:0
d = d3dshot.create(capture_output="pytorch_float_gpu")
Trying to use a Capture Output for which your environment does not meet the requirements will result in an error.
Singleton
Windows only allows 1 instance of Desktop Duplication per process. To make sure we fall in line with that limitation to avoid issues, the D3DShot class acts as a singleton. Any subsequent calls to d3dshot.create() will always return the existing instance.
d = d3dshot.create(capture_output="numpy")
# Attempting to create a second instance
d2 = d3dshot.create(capture_output="pil")
# Only 1 instance of D3DShot is allowed per process! Returning the existing instance...
# Capture output remains 'numpy'
d2.capture_output.backend
# Out[1]: <d3dshot.capture_outputs.numpy_capture_output.NumpyCaptureOutput at 0x2672be3b8e0>
d == d2
# Out[2]: True
Frame Buffer
When you create a D3DShot instance, a frame buffer is also initialized. It is meant as a thread-safe, first-in, first-out way to hold a certain quantity of captures and is implemented as a collections.deque.
By default, the size of the frame buffer is set to 60. You can customize it when creating your D3DShot object.
d = d3dshot.create(frame_buffer_size=100)
Be mindful of RAM usage with larger values; You will be dealing with uncompressed images which use up to 100 MB each depending on the resolution.
The frame buffer can be accessed directly with d.frame_buffer but the usage of the utility methods instead is recommended.
The buffer is used by the following methods:
d.capture()d.screenshot_every()
It is always automatically cleared before starting one of these operations.
Displays
When you create a D3DShot instance, your available displays will automatically be detected along with all their relevant properties.
d.displays
Out[1]:
[<Display name=BenQ XL2730Z (DisplayPort) adapter=NVIDIA GeForce GTX 1080 Ti resolution=2560x1440 rotation=0 scale_factor=1.0 primary=True>,
<Display name=BenQ XL2430T (HDMI) adapter=Intel(R) UHD Graphics 630 resolution=1920x1080 rotation=0 scale_factor=1.0 primary=False>]
By default, your primary display will be selected. At all times you can verify which display is set to be used for capture.
d.display
Out[1]: <Display name=BenQ XL2730Z (DisplayPort) adapter=NVIDIA GeForce GTX 1080 Ti resolution=2560x1440 rotation=0 scale_factor=1.0 primary=True>
Selecting another display for capture is as simple as setting d.display to another value from d.displays
d.display = d.displays[1]
d.display
Out[1]: <Display name=BenQ XL2430T (HDMI) adapter=Intel(R) UHD Graphics 630 resolution=1080x1920 rotation=90 scale_factor=1.0 primary=False>
Display rotation and scaling is detected and handled for you by D3DShot:
- Captures on rotated displays will always be in the correct orientation (i.e. matching what you see on your physical displays)
- Captures on scaled displays will always be in full, non-scaled resolution (e.g. 1280x720 at 200% scaling will yield 2560x1440 captures)
Regions
All capture methods (screenshots included) accept an optional region kwarg. The expected value is a 4-length tuple of integers that is to be structured like this:
(left, top, right, bottom) # values represent pixels
For example, if you want to only capture a 200px by 200px region offset by 100px from both the left and top, you would do:
d.screenshot(region=(100, 100, 300, 300))
If you are capturing a scaled display, the region will be computed against the full, non-scaled resolution.
If you go through the source code, you will notice that the region cropping happens after a full display capture. That might seem sub-optimal but testing has revealed that copying a region of the GPU D3D11Texture2D to the destination CPU D3D11Texture2D using CopySubresourceRegion is only faster when the region is very small. In fact, it doesn't take long for larger regions to actually start becoming slower than the full display capture using this method. To make things worse, it adds a lot of complexity by having the surface pitch not match the buffer size and treating rotated displays differently. It was therefore decided that it made more sense to stick to CopyResource in all cases and crop after the fact.
Usage
Create a D3DShot instance
import d3dshot
d = d3dshot.create()
create accepts 2 optional kwargs:
capture_output: Which capture output to use. See the Capture Outputs section under Conceptsframe_buffer_size: The maximum size the frame buffer can grow to. See the Frame Buffer section under Concepts
Do NOT import the D3DShot class directly and attempt to initialize it yourself! The create helper function initializes and validates a bunch of things for you behind the scenes.
Once you have a D3DShot instance in scope, we can start doing stuff with it!
List the detected displays
d.displays
Select a display for capture
Your primary display is selected by default but if you have a multi-m
