MyHoard

MyHoard is a daemon for creating, managing and restoring MySQL backups. The backup data can be stored in any of the supported cloud object storages. It is functionally similar to pghoard backup daemon for PostgreSQL.

Features

Automatic periodic full backup
Automatic binary log backup in near real-time
Cloud object storage support (AWS S3, Google Cloud Storage, Azure)
Encryption and compression
Backup restoration from object storage
Point-in-time-recovery (PITR)
Automatic backup history cleanup based on number of backups and/or backup age
Purging local binary logs once they're backed up and not needed by other MySQL servers (requires external system to provide executed GTID info for the standby servers)
Almost no extra local disk space requirements for creating and restoring backups
Incremental backups

Fault-resilience and monitoring:

Handles temporary object storage connectivity issues by retrying all operations
Metrics via statsd using Telegraf® tag extensions
Unexpected exception reporting via Sentry
State reporting via HTTP API
Full internal state stored on local file system to cope with process and server restarts

Overview

There are a number existing tools and scripts for managing MySQL backups so why have yet another tool? As far as taking a full (or incremental) snapshot of MySQL goes, Percona XtraBackup does a very good job and is in fact what MyHoard is using internally as well. Where things usually get more complicated is when you want to back up and restore binary logs so that you can do point-in-time recovery and reduce data loss window. Also, as good as Percona XtraBackup is for taking and restoring the backup you still need all sorts of scripts and timers added around it to actually execute it and if anything goes wrong, e.g. because of network issues, it's up to you to retry.

Often binary log backup is based on just uploading the binary log files using some simple scheduled file copying mechanism and restoring them is left as an afterthought, usually just comprising of "download all the binlogs and then use mysqlbinlog to replay them". In addition to not having proper automation to do this to ensure it is repeatable and safe this approach also does not work in some cases: In order for binary log restoration with mysqlbinlog to be safe you need to have all binary logs on local disk. For change heavy environments this may be much more than the size of the actual database and if server disk is adjusted based on the database size the binary logs may simply not fit on the disk.

MyHoard uses an alternative approach for binary log restoration, which is based on presenting the backed up binary logs as relay logs in batches via direct relay log index manipulation and having the regular SQL slave thread apply them as if they were replicated from another MySQL server. This allows applying them in batches so there's very little extra disk space required during restoration and this would also allow applying them in parallel (though that requires more work, currently there are known issues with using slave-parallel-workers value other than 0, i.e. multithreading must currently be disabled).

Existing tooling also doesn't pay much attention to real life HA environments and failovers where the backup responsibilities need to be switched from one server to another and getting uninterrupted sequence of backed up transactions that can be restored to any point in time, including the time around the failover. This requires something much more sophisticated than just blindly uploading all local binary logs.

MyHoard aims to provide a single solution daemon that takes care of all of your MySQL backup and restore needs. It handles creating, managing and restoring backups in multi-node setups where master nodes may frequently be going away (either because of rolling forward updates or actual server failure). You just need to create a fairly simple configuration file, start the systemd service on the master and any standby servers and make one or two HTTP requests to get the daemon into correct state and it will start automatically doing the right things.

Basic usage

On the very first master after you've initialized MySQL database and started up MyHoard you'd do this:

curl -XPUT -H "Content-Type: application/json" -d '{"mode": "active"}' \
  http://localhost:16001/status

This tells MyHoard to switch to active mode where it starts backing up data on this server. If there are no existing backups it will immediately create the first one.

On a new standby server you'd first install MySQL and MyHoard but not start or initialize MySQL (i.e. don't do mysqld --initialize). After starting the MyHoard service you'd do this:

curl http://localhost:16001/backup  # lists all available backups
curl -XPUT -H "Content-Type: application/json" \
  -d '{"mode": "restore", "site": "mybackups", "stream_id": "backup_id", "target_time": null}' \
  http://localhost:16001/status

This tells MyHoard to fetch the given backup, restore it, start the MySQL server once finished, and switch to observe mode where it keeps on observing what backups are available and what transactions have been backed up but doesn't do any backups itself. Because binary logging is expected to be enabled also on the standby server MyHoard does take care of purging any local binary logs that contain only transactions that have been backed up. If you wanted to restore to a specific point in time you'd just give a timestamp like "2019-05-22T11:19:02Z" and restoration will be performed up until the last transaction before the target time.

If the master server fails for any reason you'd do this on one of the standby servers:

curl -XPUT -H "Content-Type: application/json" -d '{"mode": "promote"}' \
  http://localhost:16001/status

This updates the object storage to indicate this server is now the master and any updates from the old master should be ignored by any other MyHoard instances. (The old master could still be alive at this point but e.g. responding so slowly that it is considered to be unavailable yet it might be able to accept writes and back those up before going totally away and those transactions must be ignored when restoring backups in the future because they have not been replicated to the new master server.) After the initial object storage state update is complete MyHoard switches itself to active mode and resumes uploading binary logs to the currently active backup stream starting from the first binary log that contains transactions that have not yet been backed up.

Requirements

MyHoard requires Python 3.10 or later and some additional components to operate:

Currently MyHoard only works on Linux and expects MySQL service to be managed via systemd.

MyHoard requires MySQL to be used and configured in a specific manner in order for it to work properly:

Single writable master, N read only standbys
Binary logging enabled both on master and on standbys
binlog_format set to ROW
Global transaction identifiers (GTIDs) enabled
Use of only InnoDB databases

Configuration options

myhoard.json has an example configuration that shows the structure of the config file and has reasonable default values for many of the settings. Below is full list the settings and the effect of each.

backup_settings.backup_age_days_max

Maximum age of backups. Any backup that has been closed (marked as final with no more binary logs being uploaded to it) more than this number of days ago will be deleted from storage, unless total number of backups is below the minimum number of backups.

backup_settings.backup_count_max

Maximum number of backups to keep. Because new backups can be requested manually it is possible to end up with a large number backups. If the total number goes above this backups will be deleted even if they are not older than backup_age_days_max days.

backup_settings.backup_count_min

Minimum number of backups to keep. If for example the server is powered off and then back on a month later, all existing backups would be very old. However, in that case it is usually not desirable to immediately delete all old backups. This setting allows specifying a minimum number of backups that should always be preserved regardless of their age.

backup_hour

The hour of day at which to take new full backup. If backup interval is less than 24 hours this is used as base for calculating the backup times. E.g. if backup interval was 6 hours and backup hour was 4, backups would be taken at hours 4, 10, 16 and 22.

backup_minute The minute of hour at which to take new full backup.

backup_interval_minutes

The interval in minutes at which to take new backups. Individual binary logs are backed up as soon as they're created so there's usually no need to have very frequent full backups. Note: If this value is not does not have a factor of 1440 (1 day) then the backup_hour and backup_minute settings cannot be changed once the first backup has been taken, as having a cycle not as a multiple of days means that the hour and minute of the backup will not be the same each day.

forced_binlog_rotation_interval

How frequently, in seconds, to force creation of new binary log if one hasn't been created otherwise. Th

Myhoard

Install / Use

README