Pghoard
PostgreSQL® backup and restore service
Install / Use
/learn @Aiven-Open/PghoardREADME
PGHoard |BuildStatus|_
.. |BuildStatus| image:: https://github.com/aiven/pghoard/actions/workflows/build.yml/badge.svg?branch=main .. _BuildStatus: https://github.com/aiven/pghoard/actions .. image:: https://codecov.io/gh/aiven/pghoard/branch/main/graph/badge.svg?token=nLr7M7hvCx :target: https://codecov.io/gh/aiven/pghoard
pghoard is a PostgreSQL® backup daemon and restore tooling that stores backup data in cloud object stores.
Features:
- Automatic periodic basebackups
- Automatic transaction log (WAL/xlog) backups (using either
pg_receivexlog,archive_commandor experimental PG native replication protocol support withwalreceiver) - Optional Standalone Hot Backup support
- Cloud object storage support (AWS S3, Google Cloud, OpenStack Swift, Azure, Ceph)
- Backup restoration directly from object storage, compressed and encrypted
- Point-in-time-recovery (PITR)
- Initialize a new standby from object storage backups, automatically configured as a replicating hot-standby
Fault-resilience and monitoring:
- Persists over temporary object storage connectivity issues by retrying transfers
- Verifies WAL file headers before upload (backup) and after download (restore), so that e.g. files recycled by PostgreSQL are ignored
- Automatic history cleanup (backups and related WAL files older than N days)
- "Archive sync" tool for detecting holes in WAL backup streams and fixing them
- "Archive cleanup" tool for deleting obsolete WAL files from the archive
- Keeps statistics updated in a file on disk (for monitoring tools)
- Creates alert files on disk on problems (for monitoring tools)
Performance:
- Parallel compression and encryption
- WAL pre-fetching on restore
Overview
PostgreSQL Point In Time Replication (PITR) consists of a having a database basebackup and changes after that point go into WAL log files that can be replayed to get to the desired replication point.
PGHoard supports multiple operating models. The basic mode where you have a
separate backup machine, pghoard can simply connect with
pg_receivexlog to receive WAL files from the database as they're
written. Another model is to use pghoard_postgres_command as a
PostgreSQL archive_command. There is also experimental support for PGHoard to
use PostgreSQL's native replication protocol with the experimental
walreceiver mode.
With both modes of operations PGHoard creates periodic basebackups using
pg_basebackup that is run against the database in question.
The PostgreSQL write-ahead log (WAL) and basebackups are compressed with Snappy (default), Zstandard (configurable, level 3 by default) or LZMA (configurable, level 0 by default) in order to ensure good compression speed and relatively small backup size. For performance critical applications it is recommended to test compression algorithms to find the most suitable trade-off for the particular use-case. E.g. Snappy is fast but yields larger compressed files, Zstandard (zstd) on the other hand offers a very wide range of compression/speed trade-off.
Optionally, PGHoard can encrypt backed up data at rest. Each individual file is encrypted and authenticated with file specific keys. The file specific keys are included in the backup in turn encrypted with a master RSA private/public key pair.
PGHoard supports backing up and restoring from either a local filesystem or from various object stores (AWS S3, Azure, Ceph, Google Cloud and OpenStack Swift.)
In case you just have a single database machine, it is heavily recommended to utilize one of the object storage services to allow backup recovery even if the host running PGHoard is incapacitated.
Requirements
PGHoard can backup and restore PostgreSQL versions 9.6 and above, but is only tested and actively developed with version 12 and above.
The daemon is implemented in Python and is tested and developed with version 3.10 and above. The following Python modules are required:
- psycopg2_ to look up transaction log metadata
- requests_ for the internal client-server architecture
.. _psycopg2: http://initd.org/psycopg/
.. _requests: http://www.python-requests.org/en/latest/
Optional requirements include:
- azure_ for Microsoft Azure object storage (patched version required, see link)
- botocore_ for AWS S3 (or Ceph-S3) object storage
- google-api-client_ for Google Cloud object storage
- cryptography_ for backup encryption and decryption (version 0.8 or newer required)
- snappy_ for Snappy compression and decompression
- zstandard_ for Zstandard (zstd) compression and decompression
- systemd_ for systemd integration
- swiftclient_ for OpenStack Swift object storage
- paramiko_ for sftp object storage
.. _azure: https://github.com/aiven/azure-sdk-for-python/tree/aiven/rpm_fixes
.. _botocore: https://github.com/boto/botocore
.. _google-api-client: https://github.com/google/google-api-python-client
.. _cryptography: https://cryptography.io/
.. _snappy: https://github.com/andrix/python-snappy
.. _zstandard: https://github.com/indygreg/python-zstandard
.. _systemd: https://github.com/systemd/python-systemd
.. _swiftclient: https://github.com/openstack/python-swiftclient
.. _paramiko: https://github.com/paramiko/paramiko
Developing and testing PGHoard also requires the following utilities: flake8_, pylint_ and pytest_.
.. _flake8: https://flake8.readthedocs.io/
.. _pylint: https://www.pylint.org/
.. _pytest: http://pytest.org/
PGHoard has been developed and tested on modern Linux x86-64 systems, but should work on other platforms that provide the required modules.
Vagrant
The Vagrantfile can be used to setup a vagrant development environment. The vagrant environment has python 3.10, 3.11 and 3.12 virtual environments and installations of postgresql 13, 14, 15, 16, 17 and 18.
By default vagrant up will start a Virtualbox environment. The Vagrantfile will also work for libvirt, just prefix
VAGRANT_DEFAULT_PROVIDER=libvirt to the vagrant up command.
Any combination of Python (3.10, 3.11 and 3.12) and Postgresql (13, 14, 15, 16, 17, 18)
Bring up vagrant instance and connect via ssh::
vagrant up vagrant ssh vagrant@ubuntu2004:~$ cd /vagrant
Test with Python 3.11 and Postgresql 17::
vagrant@ubuntu2004:~$ source ~/venv3.11/bin/activate vagrant@ubuntu2004:~$ PG_VERSION=17 make unittest vagrant@ubuntu2004:~$ deactivate
Test with Python 3.12 and Postgresql 18::
vagrant@ubuntu2004:~$ source ~/venv3.12/bin/activate vagrant@ubuntu2004:~$ PG_VERSION=18 make unittest vagrant@ubuntu2004:~$ deactivate
And so on
Building
To build an installation package for your distribution, go to the root directory of a PGHoard Git checkout and run:
Debian::
make deb
This will produce a .deb package into the parent directory of the Git
checkout.
Fedora::
make rpm
This will produce a .rpm package usually into rpm/RPMS/noarch/.
Python/Other::
python setup.py bdist_egg
This will produce an egg file into a dist directory within the same folder.
Installation
To install it run as root:
Debian::
dpkg -i ../pghoard*.deb
Fedora::
dnf install rpm/RPMS/noarch/*
On Linux systems it is recommended to simply run pghoard under
systemd::
systemctl enable pghoard.service
and eventually after the setup section, you can just run::
systemctl start pghoard.service
Python/Other::
easy_install dist/pghoard-1.7.0-py3.6.egg
On systems without systemd it is recommended that you run pghoard
under Supervisor_ or other similar process control system.
.. _Supervisor: http://supervisord.org
Setup
After this you need to create a suitable JSON configuration file for your installation.
-
Make sure PostgreSQL is configured to allow WAL archival and retrieval.
postgresql.confshould havewal_levelset toarchiveor higher andmax_wal_sendersset to at least1(archive_commandmode) or at least2(pg_receivexlogandwalreceivermodes), for example::wal_level = archive max_wal_senders = 4Note that changing
wal_levelormax_wal_senderssettings requires restarting PostgreSQL. -
Create a suitable PostgreSQL user account for
pghoard::CREATE USER pghoard PASSWORD 'putyourpasswordhere' REPLICATION;
-
Edit the local
pg_hba.confto allow access for the newly created account to thereplicationdatabase from the primary and standby nodes. For example::TYPE DATABASE USER ADDRESS METHOD
host replication pghoard 127.0.0.1/32 md5
After editing, please reload the configuration with either::
SELECT pg_reload_conf();
or by sending directly a
SIGHUPto the PostgreSQLpostmasterprocess. -
Fill in the created user account and primary/standby addresses into the configuration file
pghoard.jsonto the sectionbackup_sites. -
Fill in the possible object storage user credentials into the configuration file
pghoard.jsonunder sectionobject_storagein case you wishpghoardto back up into the cloud. -
Now copy the same
pghoard.jsonconfiguration to the standby node if there are any.
Other possible configuration settings are covered in more detail under the
Configuration keys_ section of this README.
- If all has been set up correctly up to this point,
pghoardshould now be ready to be started.
Backing up your database
PostgreSQL backups consist of full database backups, basebackups, plus write ahead logs and related metadata, WAL. Both basebackups and WAL are required to create and restore a consistent database (does not apply for standalone hot backups).
To enable backups with PGHoard the pghoard daemon must be running
locally. The daemon will periodically take full basebackups of the database
files to th
