Partitioning with PostgreSQL v11 （转发）

原文：

https://rsbeoriginal.medium.com/partitioning-with-postgresql-v11-6fe5388c6e98

What — Partitioning is splitting one table into multiple smaller tables.
When — It is useful when we have a large table and some columns are frequently occurring inWHEREclause when we the table is queried. Let’s suppose Book table in library management system, so here inventory of books can be huge and will always be increasing. But, the general queries over the Book table will basically be about the book status (borrowed/not borrowed or available/not available). Here, we observe that most queries on book table would be on attribute status. So, it would be better to split the Book table based on attribute status.

Partition Key .

Why — One most obvious benefit that we can get from partitioning is query performance, if we are able to identify partition key which is being frequently used in most queries. Other benefits could be efficient usage of memory. For example, if data is partitioned based on time or usage, then older or less used data can be migrated to cheaper or slower storage media.

Partitioning won’t help much if the partitions are highly skewed

How — That’s what the article is all about !

Range Partition — It can be used when we want to create partition on range of values of attribute. For example, like employee data can be partitioned based on age like 20–30, 30–40, 40–50 etc. or like medium can partition the articles based on month of publishing an article.
status attribute in the book table can have 2 values (borrowed /not borrowed), so book table can be partitioned for each status.
Hash Partition — It can be used to distribute the data among the partitions when we aren’t sure whether range or list partition would give us uniform distribution among partitions but we have growing data to be distributed evenly in partitions. It is done by specifying modulus and remainder for each partition. It is compatible with all data typesIn this article, we’ll be using PostgreSQL 11.

process .

process as it’s a verb

We’ll start with creating 2 tables:

process — Normal table
process_partition — Partition table with partition key as status

process tables will contain

id — Auto-incremented id
name
status — possible values for status OPEN , IN_PROGRESS and DONE

process table first

id)
);

process_partition table

Partitioning of process_partition table based on status

'OPEN');

process_partition_done for DONE status.

If we want to use status as partition key, then we are forced to add it in primary key also. Similarly if we have any Unique key constraint, then we’ll need to add the partition key there also. If we don’t, then we’ll get error while creating table.

Partitioning with PostgreSQL v11 （转发）

Error : partition key is not added in primary key

Now, we’ll add around 10000 rows first in each status of the 3 status.

END; $$

30000 rows added in process table

Similarly we’ll add in process_partition table also.

END; $$

30000 rows added in process_partion table

10000 rows in process_partition_open for open status

process_partition table has 0 rows

process_partition_done .

status column each row is added to respective partition and total rows of master table are the aggregation of all partition tables.

So now let’s insert a new row and then try to change status and observe the behaviour.

'OPEN')

Moving process row is added to process_partition_open with id 30001

process_partition_open .

IN_PROGRESS status.

'OPEN';

after changing status row moves to process_partition_in_progress

process_partition_in_progress table. So, we observe that there is movement of rows from one partition to other when there is change in status, i.e partition key.

Below are the results for 30000 rows and when data is increased 10x times to 300000 rows.

For 30000 rows, same query has less than half the query cost as compared to unpartitioned table

Partitioning with PostgreSQL v11 （转发）

For 300000 rows also, same query has less than half the query cost as compared to unpartitioned table

process_partition_open table.

query without partition key

WHERE clause so postgres doesn’t know which partition to scan, so it scans all the partitions. This case becomes similar to unpartitioned table because in query we are not using partition key.

enable_partition_pruning

SHOW enable_partition_pruning;

enable_partition_pruning = on

process_partition_open table because it contains all rows whose status is OPEN.

enable_partition_pruning , using below statement.

SET enable_partition_pruning = off

enable_partition_pruning = off

process_partition table are scanned.

That’s it, Folks !

Sub-Partitioning and Attach/Dettach partitions

References

https://www.postgresql.org/docs/11/ddl-partitioning.html#DDL-PARTITIONING-DECLARATIVE-BEST-PRACTICES