Demodb
Demonstration Database
Install / Use
/learn @postgrespro/DemodbREADME
Demo Database Generator
🌐 English | Русский
Purpose
The demo database generator program allows you to create a database with the demo database structure and adapt its content to your needs. Thanks to some parameters, you can change the data characteristics, for example, route network or airline fleet.
The program can also be used as a load generator. For example, to demonstrate monitoring systems or to practice query optimization.
Installation
-
Clone the
demodbrepository. -
In
psql, connect to any database, except fordemo.Warning
If the
demodatabase already exists, it will be deleted and all of its data will be lost during the installation. -
Make sure that you are in the
demodbrepository directory:\! pwdIf necessary, change the directory, using
\cd. -
Start installation:
\i install.sql
The commands create the demo database and two schemas in it:
gen— for the generator objectsbookings— for the created demo database objects
To work with the generator, you need the following extensions:
btree_gist— to implement a temporary keyearthdistanceandcube— to calculate the great circle distancedblink— to launch parallel processes
As a rule, these extensions are included in the standard PostgreSQL installation package.
Quick Start
Launch Generation
The demo database generation is always performed by parallel processes (even if there is only one process).
To launch generation, run the generate procedure, specifying the start and end simulation time.
For example, to generate a database for a one-year period, run the following command:
CALL generate( now(), now() + interval '1 year' );
To speed up generation, you can run several parallel processes:
CALL generate( now(), now() + interval '1 year', 4 );
Check generation status
To quickly check the generation status, run the following command:
SELECT busy();
Possible values:
t(true) — generation is in processf(false) — generation is completed
To get more details, see the gen.log generator log:
SELECT * FROM gen.log ORDER BY at DESC LIMIT 30 \watch 60
Abort generation
To abort generation, run the following command:
CALL abort();
Note
A broken connection does not cancel generation.
Complete generation
-
To make sure that generation is complete, check its status:
SELECT busy();If generation is completed successfully, the output will be
f(false). -
Check the created demo database:
\i check.sql
Export database
You can export the demo database as an SQL script.
To create an SQL script, in the repository directory, run the following command as an OS user:
./export.sh > `date +%Y%m%d`.sql
Continue generation
You can continue generation from where it stopped.
For example, to generate data for another three months, run the following command:
CALL continue( now() + interval '1 year 3 month', 4 );
Setup
To customize data generation, change the required configuration parameters at the database level:
ALTER DATABASE demo SET parameter-name = value;
Note
You cannot change configuration parameters at the session level (
SET) since parallel processes require access to their values.
Some internal settings are not available at the user level: they are implemented as immutable functions or constants in the source code (incorrect values can lead to generation errors or unexpected results).
The available configuration parameters are described below.
gen.connstr
Connection string to the demo database.
Default value: dbname=demo (connection to a local server). If necessary, you can specify any value for this parameter supported by libpq.
gen.airlines_name
Airline name. Returned only as part of the bookings.version function output.
Default value: PostgresPro.
gen.airlines_code
Airline code. Used as a prefix to a ticket number (bookings.tickets.ticket_no).
Default value: PG.
gen.traffic_coeff
Factor for converting relative passenger traffic (gen.airports_data.traffic) into a number of bookings made from an airport within a week.
gen.domestic_frac
Fraction of flights with flight segments within one country. This is a target value. Used to build a flight graph.
Default value: 0.9.
gen.roundtrip_frac
Fraction of round trip bookings. This is a target value. The real number is less than the target since return tickets are not always available.
Default value: 0.9.
gen.delay_frac
Fraction of delayed flights.
Default value: 0.05.
gen.cancel_frac
Fraction of cancelled flights.
Default value: 0.005.
gen.exchange
Factor for converting the flight duration (in minutes) into the ticket price in the chosen currency. Depending on the travel class, an additional multiplying factor is applied (get_price function).
Default value: 50.
gen.max_pass_per_booking
Maximum number of passengers per booking.
Default value: 5.
gen.min_transfer
Minimum time in hours between connecting flights.
Default value: 2.
Note
The lower the limit is, the more flights become available for route planning. At the same time, the risk of getting late for a connecting flight is growing.
gen.max_transfer
Maximum time in hours between connecting flights.
Default value: 48.
Note
The higher the limit is, the higher the chances are that the passenger will get to their destination by our airline.
gen.max_hops
Maximum number of flight connections in one ticket.
Default value: 4.
Note
The higher the limit is, the higher the chances are that the flight is chosen for a certain route.
gen.log_severity
Priority of messages to be logged in the gen.log log table. The highest priority is 0.
Default value: 0 (only important messages are logged). It it recommended to change the default value only for debugging purposes.
bookings.lang
Language in which names of aircraft models, countries, cities, and airports are displayed. The parameter is set during the database deployment. The Russian (ru) and English (en) languages are available.
Default value: en.
Software Interface
Procedures and functions that allow users to work with the generator.
generate
Launch demo database generation.
Parameters:
start_date(timestamptz) — simulation start time for generationend_date(timestamptz) — simulation end time for generationjobs(integer) — number of parallel processes (default value:1)
Example:
CALL generate(
start_date => date_trunc( 'day', now() ),
end_date => date_trunc( 'day', now() + interval '1 year' ),
jobs => 4
);
continue
Continue demo database generation from where it stopped after the previous generate or continue call.
Parameters:
end_date(timestamptz) — simulation end time for generationjobs(integer) — number of parallel processes (default value:1)
The simulation start time is set automatically. It corresponds to the end of the previous generation process.
Example:
CALL continue(
end_date => date_trunc( 'day', now() + interval '2 years' ),
jobs => 4
);
busy
Show the current generation status.
Possible values:
t(true) — generation is in progressf(false) — generation is completed
Example:
SELECT busy();
abort
Cancel generation before it is completed.
Example:
CALL abort();
After generation is aborted, you can launch a new generation process using generate. It is not recommended to resume generation using continue since it does not guarantee correct generation.
get_passenger_name
Return a random passenger`s name from a chosen country.
Parameter:
country(text) — country code
Possible default values: RU (Russia), CN (China), IN (India), US (the USA), CA (Canada), JP (Japan), FR (France), DE (Germany), IT (Italy), GB (the UK), CL (Chile), SE (Sweden), NP (Nepal), FI (Finland), NZ (New Zealand), AT (Austria), and CZ (the Czech Republic).
Example that returns ten random Nepalese names:
CALL calc_names_cume_dist(); -- execute once
SELECT get_passenger_name('NP') FROM generate_series(1,10);
Note
- To add a new country, add new names to the
firstnames.datandlastnames.datreference files (for more information, refer to the «Questions and Answers» section).- The function is not necessary to work with the demo database but can be useful for other databases that require random reasonably distributed names.
Log Table
The generator logs messages in the gen.log table that you can check after generation or use to monitor the generation process:
SELECT * FROM gen.log ORDER BY at DESC LIMIT 30 \watch 60
The log table stores all the messages with the priority equal to or higher than the gen.severity parameter value. By default, only important messages with the 0 priority are logged. For debugging, you can set a non-zero value for the parameter.
Key messages and their meaning are listed below:
-
Job N (connname=connection): result
Work process N is launched using connection.
-
day: one day in time
time real time was required to calculate day simulation day. Displayed once in a simulation day.
-
New bookings: B (forceoneway F), nopath P, noseat S
During the simulation day, B bookings were created. F of them were planned as round-trip (considering the
gen.roundtrip_fractarget parameter) but remained one-way since a return ticket
