pgcmp ** Purpose

This toolset is intended to perform comparisons ("reconciliation") of schemas between databases to determine if they are /equivalent/. There are several contexts where this is useful, each with a somewhat different understanding of /equivalent/.
1. Build process
  - The build process has database schema in two forms:
    - Build From Scratch :: A deployable database is built from scratch, from an empty database
    - Upgrade From Earlier Version :: An older version is upgraded to the latest version using a set of /upgrade scripts/.
  - We need to ensure that the results of building from scratch are /identical/ to those that come from upgrading the old version using upgrade scripts.
  - In this case, we do not expect /any/ variations, aside from the fact that OIDs will vary between databases. These databases will be built within the same smoke test environment, and thus have the same sets of users, roles, and such, and there should not be any additional components introduced (as might be the case in production)
  - It is therefore considered a build-breaking error to find /any/ variation between the two database schemas
2. Validating upgrades
  - When DBAs are planning to upgrade an instance, using the /upgrade scripts/, it is essential that their starting point represents a schema that is /equivalent/ to the previous version. Otherwise, if the starting point is /wrong/, then the upgrade scripts may not provide a proper upgrade to the new version of the schema.
  - In this environment, there are a number of /acceptable/ kinds of differences that should be accommodated between the "production" schema and a "development" schema:
    - Replication :: Development does not include replication, but in production, Slony schema, tables, functions, and such will exist, and the reconciliation should be able to exclude these differences.
    - DBA Tooling :: DBAs add additional tooling into production databases, generally in their own schemas, to help with monitoring.
    - Supplementary Applications :: Some applications have add-ons, and, when considering the main application, we may wish to exclude add-ons.
    - Specialized Ownership :: Often, different users are used in production than are used in development.
  - As a result, when reconciling the "old" schema version against the version in production, it will be necessary to accept the need for /reconciling entries/ so that DBAs are not left puzzling over hundreds of differences that may be reasonably /expected/.
3. Validating replicas in sync
  - When deploying an upgrade, it is not sufficient to know that the "master" node has an agreeable schema; it is essential that /all/ replicas have suitable schemas.
  - In the above analysis, it was necessary to accommodate there being considerable difference between the "development" schema and the "production" schema. When comparing replicas, differences should be considered rather more dangerous. Differences to accept/accommodate include:
    - Node-specific deployment :: In some cases, add-ons are only installed on certain nodes in the cluster.
    - Origin/Replica differences :: Slony defines some triggers that run only on origin nodes (/e.g./ - ~logtrigger()~), and others that run only on replica nodes (/e.g./ - ~denyaccess()~). ** The Comparison Script ~pgcmp~ tooling consists of pair of scripts:
This script pulls data about the schema from the two databases that are to be compared, loads that data into a third database where that data is compared in order, and performs a reconciliation of the similarities and differences.
- pgcmp-dump :: This tool extracts schema data from a database, putting it into a data file for later analysis
- pgcmp :: This tool uses a pair of files generated by ~pgcmp-dump~, and looks for differences.

*** Security Considerations This script requires /read-only/ access to schema data in ~INFORMATION_SCHEMA~ and ~pg_catalog~ in both databases that are to be compared.

This script requires /write/ access to a third database in which
it constructs a series of tables used to perform the
reconciliation.

*** Running pgcmp components

**** pgcmp-dump

~pgcmp-dump~ uses a series of environment variables to control its activity.

Database connection information
- PGDB :: Database name
- PPGGHOST :: Database host
- PGUSER :: DB User
- PGPORT :: Port number
- PGPGURI :: Since PostgreSQL 9.2, [[https://www.postgresql.org/docs/9.2/static/libpq-connect.html][libpq supports PGURIs]] as a singular connection string. ~pgcmp~ prefers the use of these URIs. If the individual parameters above are offered, they will be constructed into a URI; if ~PGURI~ is populated, it will be used instead.
PGCLABEL :: Indicates an identifer for the database's data to report in analysis output
PGCMPOUTPUT :: The name of the file in which to store output, that is, the set of metadata about the database schema that will be used later for comparison with the schema of another database.
PGCMPIGNORESLONYTRIGGERS :: defaults to ~false~; if ~true~, then ignore differences involving triggers of the form created by Slony-I

#+BEGIN_EXAMPLE $ PGURI=postgresql://postgres@localhost/test1 PGCMPOUTPUT=/tmp/test-pgcmp-file1 PGCLABEL=db1 ../pgcmp-dump pgcmp-dump - extracting schema data from database PGURI=[postgresql://postgres@localhost/test1]

Data file containing output: PGCMPOUTPUT[/tmp/test-pgcmp-file1] Label: PGCLABEL=[db1]

Extracted schema data for postgresql://postgres@localhost/test1 -rw-r--r-- 1 cbbrowne cbbrowne 7350 Nov 3 12:12 /tmp/test-pgcmp-file1 #+END_EXAMPLE

**** pgcmp

~pgcmp~ uses a pair of input files as produced by ~pgcmp-dump~, loads them into a comparison database, and checks for differences.

It uses a series of environment variables to control its activity.

PGBINDIR :: Indicates where to find ~psql~
Database connection information
- PGDB :: Database name
- PPGGHOST :: Database host
- PGUSER :: DB User
- PGPORT :: Port number
- PGURI :: Since PostgreSQL 9.2, [[https://www.postgresql.org/docs/9.2/static/libpq-connect.html][libpq supports PGURIs]] as a singular connection string. ~pgcmp~ prefers the use of these URIs. If the individual parameters above are offered, they will be constructed into a URI; if ~PGURI~ is populated, it will be used instead.
PGCLABEL1 :: Label identifying data for the first database
PGCLABEL2 :: Label identifying data for the second database
PGCOMITSCHEMAS :: Indicates (as a SQL clause suitable to insert VALUES into a table), a set of schema labels from ~pg_catalog.pg_namespace~ that should be omitted from analysis. As identifiers, they may require doublequotes consistent with the function ~pg_catalog.quote_ident()~. ~pg_catalog~ and ~information_schema~ were automatically left out in ~pgcmp-dump~, so they do not need to be included.
Data file information
- PGCWORKDIR :: Work directory where work files are placed/found, defaults to ~/tmp~
- PGCEXPLANATIONS :: Contains records intended to explain expected variations
  - Structure is a Postgres ~COPY~ of data for a table of expected differences, defined as follows: #+BEGIN_EXAMPLE create table expected_differences ( object_type text, object_name text, difference_type text, difference_cause text, primary key(object_type, object_name, difference_type, difference_cause) ); #+END_EXAMPLE
- PGCFULLOUTPUT :: Contains full output of all records, those that match and those that do not.
  - Structure is a Postgres ~COPY~ of data from a table with the following structure: #+BEGIN_EXAMPLE create table fulloutput ( object_name text, object_type text, label1 text, object_definition1 text, label2 text, object_definition2 text, difference_type text, difference_cause text, importance integer ); #+END_EXAMPLE
- PGCUNEXPLAINED :: Contains the set of records that did not match between schemas that did not have explanations provided.
  - Structure is a Postgres COPY of data from a table with the following structure: #+BEGIN_EXAMPLE create table unexplained_items ( object_name text, object_schema text, object_type text, label1 text, id1 integer, object_definition1 text, label2 text, id2 integer, object_definition2 text, difference_type text, importance integer ); #+END_EXAMPLE
- PGCBADEXPLAIN :: Contains the set of records that did not match between schemas where explanations offered did not match the problem.
  - Structure is a Postgres COPY of data from a table with the following structure: #+BEGIN_EXAMPLE create table badexplanations_items ( object_type text, object_schema text, object_name text, difference_type text, difference_cause text, importance integer ); #+END_EXAMPLE
- PGCMPINPUT1 :: Input file containing metadata about the schema for the first database
- PGCMPINPUT2 :: Input file containing metadata about the schema for the second database

**** Interpretations of differences

object_type :: Indicates which kind of object had a dif

Pgcmp

Install / Use

README

Related Skills