Kunlun
KunlunBase is a distributed relational database management system(RDBMS) with complete NewSQL capabilities and robust transaction ACID guarantees and is compatible with standard SQL. Applications which used PostgreSQL or MySQL can work with KunlunBase as-is without any code change or rebuild because KunlunBase supports both PostgreSQL and MySQL connection protocols and DML SQL grammars. MySQL DBAs can quickly work on a KunlunBase cluster because we use MySQL as storage nodes of KunlunBase. KunlunBase can elastically scale out as needed, and guarantees transaction ACID under error conditions, and KunlunBase fully passes TPC-C, TPC-H and TPC-DS test suites, so it not only support OLTP workloads but also OLAP workloads. Application developers can use KunlunBase to build IT systems that handles terabytes of data, without any effort on their part to implement data sharding, distributed transaction processing, distributed query processing, crash safety, high availability, strong consistency, horizontal scalability. All these powerful features are provided by KunlunBase. KunlunBase supports powerful and user friendly cluster management, monitor and provision features, can be readily used as DBaaS.
Install / Use
/learn @zettadb/KunlunREADME
KunlunBase Introduction
Directions
For more information and resources of KunlunBase, such as software, documentation, bug reports and features planned/under development, and release notes, please visit our website Download KunlunBase and use it for free at our download site
For latest release notes, see our online release notes Follow this document to install a KunlunBase cluster in a few steps using a script and a GUI web application, it's much easier, faster and more convenient than building and installing its components manually one by one.
To build kunlun-server node program from source, use build.sh directly or refer to it for instructions. To build kunlun-storage from source, see kunlun-storage/INSTALL.kunlun.md for instructions. To build cluster_mgr from source, see cluster_mgr/README for instructions.
Refer to INSTALL.kunlun.md to install a KunlunBase cluster in a totally manual but inconvenient way.
What is KunlunBase?
KunlunBase is a distributed relational database management system aimed to help users store and access massive amount (tera-bytes up to peta-bytes) of relational data and serve massive concurrent relational data access(read and/or write) workloads with stable low latency and high throughput. KunlunBase provides robust transaction ACID guarantees, high scalability, high availability, transparent data partitioning, elastic&transparent horizontal scale-out capabilities, and standard SQL plus MySQL and PostgreSQL DML SQL dialects. Clients can access distributed and/or partitioned data stored in KunlunBase the same way as using PostgreSQL and/or MySQL. All of these features altogether are known as NewSQL capabilities. In one word, KunlunBase is a NewSQL OLTP distributed RDBMS with complete OLAP functionality.
Users and applications could connect to KunlunBase using JDBC and ODBC, and C/C++ client libraries provided in community PostgreSQL and MySQL distributions, as well as PostgreSQL and MySQL client libraries for most programming languages, such as php/python/go/rust/ruby/c#/ASP/.net etc.
KunlunBase is SQL compatible, it can correctly execute all test cases in TPC-C, TPC-H and TPC-DS with excellent performance, and passes all SQL compatibility test cases contained in PostgreSQL and MySQL.
Consequently, users and applications can interact with KunlunBase exactly the same way they would do with a community MySQL and/or PostgreSQL database instance, using either standard SQL or private DML SQL extensions of MySQL and/or PostgreSQL, and get all the above NewSQL benefits without any work or effort on the client application side --- no need to modify application code or even rebuild it which were using MySQL or PostgreSQL. Furthermore, applications can utilize object relational mapping(ORM) tools like Hibernate and Mybatis to access relational data with KunlunBase so as to avoid manually writing SQL statements in application code, because of the excellent SQL compatibility of KunlunBase.
With KunlunBase, software application architects and developers can quickly design and develop robust, highly available and highly scalable information systems that are capable of processing and utilize hundreds of terabytes of data with little extra engineering effort --- all the technical&engineering challenges are conquered in KunlunBase, which greatly reduces the cost, difficulty, and timespan required to develop such powerful systems, eliminates the inherent risks originating from such challenges, and improves the overall quality and reliability(availability, robustness, stability, scalability, and performance) of such systems.
Visit www.kunlunbase.com to get the software, see its documentation, use cases, and interact with users in the forum at forum.kunlunbase.com.
Architecture
A KunlunBase cluster consists of two types of components: one or more kunlun-server nodes, and one or more storage shards. And it also shares with other KunlunBase clusters a group of cluster_mgr instances and a metadata shard. A storage or metadata shard consists of one primary instance and two or more replica instances and KunlunBase guarantees high availability to all such shards as detailed below.
This piece of software is KunlunBase's kunlun-server component, it interacts with clients to for connection validation, access control, SQL query processing, global transaction management, global deadlock detection&resolution, and DDL replication, etc.
Kunlun-server is currently developped based on PostgreSQL-11.5. In order to support some advanced features such as automatic DDL synchronization, distributed transactions processing, etc, we modified PostgreSQL code extensively rather than simply using its FDW. We modified PostgreSQL in a modular and least intrusive way so that we can easily keep upgrading with official upstream PostgreSQL releases.
A kunlun-server instance listens on a PostgreSQL port and a MySQL port configured during cluster installation. And it accepts and validates client connections requests connected from an application with either PostgreSQL or MySQL protocols. And when a connection is validated and established, it communicates with the client using either PostgreSQL or MySQL protocols respectively.
A kunlun-server node receives SQL statements from connected client connections and execute them by interacting with the cluster's storage shards to read and/or write data. Users can add more kunlun-server nodes any time as their workloads grow, each and every kunlun-server node can serve user read/write requests. A KunlunBase cluster's kunlun-server nodes locally has all the meta-data of all database objects(tables, views, materialized views, sequences, stored procs/functions, users/roles, triggers, and priviledges etc), but they don't store user data locally. Instead, kunlun-server nodes store user data into storage shards.
To execute a client SQL query, a kunlun-server instance goes through standard PostgreSQL query processing steps --- it parses the client SQL query, optimizes it and as an extension for remote data(data stored in kunlun-storage shards), we developed extra plan nodes which form one or more SQL queries to send to the target storage shards containing portions of target data needed for the client SQL query. And if the query is a SELECT or an INSERT/DELETE/UPDATE...RETURNING statement instead of a bare INSERT/DELETE/UPDATE statement, the kunlun-server node gets partial results from all target storage shards, and assembles them properly into final result by executing the rest of the query plan, and reply the final result to the client.
User data is stored in one or more storage shards instead of kunlun-server nodes. Each storage shard stores a subset of all user data in the KunlunBase cluster, and data in different storage shards don't overlap(i.e. share nothing). Users can extend or shrink the NO. of shards as their data volumns and workloads grow or shrink, and initiate a series of table move operations to select the right group of tables on existing shards for KunlunBase to move to the newly added shards, with the guidance of KunlunBase.
A storage shard is a MySQL binlog replication cluster, which currently uses either MySQL group replication or our proprietary fullsync replication to achieve high availability.
In KunlunBase we require using our kunlun-storage software to deploy storage shards and metadata shard. Kunlun-storage is a deeply engineered branch of percona-mysql-8.0 with supporting features required by KunlunBase's components. Additionally, kunlun-storage contains fixes of all community MySQL-8.0 XA transaction crash safety bugs as well as kunlun-storage contains some performance improvement.
The primary node of each kunlun-storage shard receives from kunlun-server nodes DML SQL queries to insert/update/delete user data, or return target user data. And it executes such SQL queries and return results to the requesting kunlun-server node.
Also, KunlunBase supports automatical read-write-split(RWS) --- executing read only queries in replica nodes to storage shards in order to decrease resource contention in primary nodes and utilize computing resources where the replica nodes are deployed. Consequently kunlun-server nodes could also send SELECT queries to replicas of any kunlun-storage shards under user configured conditions for data consistency and replication latency.
A KunlunBase cluster needs a meta-data shard, which is also a kunlun-storage cluster. It stores the meta-data of one or more KunlunBase clusters.
Finally we have a cluster of cluster_mgr instances which maintain correct running status for one or more KunlunBase clusters, and do extra work related to high availability, scale out, cluster data backup and restore, cluster management(e.g. kunlun-server or kunlun-storage instance installation), and so on.
Advantages
KunlunBase distributed database cluster is built for high scalability, high availability, ACID guarantees of distributed transactions, and full-fledged distributed query processing and elastic horizontal scalability, and compatibility with standard SQL besides the private DML SQL dialects of MySQL and PostgreSQL, as detailed below.
Highly Scalable
KunlunBase clusters are highly scalable. It not only scales up but also scales out: users can add more kunlun-server nodes any time to have more query processing power, and every kunlun-server node is equivalent to its peers and can serve both write and read workloads; And users(DBAs) can add more storage shards for more data storage and transaction processing capability and KunlunBase will automatically move parts of user data to the new shards to balance workloads after users issue the table move jobs with guidance of KunlunBase's XPanel utility.
DBAs can also deploy more kunlun-server nodes for analytical OLAP work
