WAL相当于oracle中的redo log,mysql中的redolog,9.6及之前名为xlog,10+当前在pg_wal文件夹中,wal段默认大小为16M,在initdb时可以指定大小,后续原则上不可以修改。可以通过pg_waldump查看二进制日志的内容。wal的结构解析https://www.cnblogs.com/abclife/p/13708947.html,虽然不完全正确,比如LSN的物理文件ID解析就不正确)。wal的物理结构如下:

postgresql中clog(commit log)内部实现、及与wal、commit的关系及细节

   WAL归档的执行过程可见https://wiki.moritetu.xyz/?PostgreSQL/%E8%A7%A3%E6%9E%90/WAL%E3%82%A2%E3%83%BC%E3%82%AB%E3%82%A4%E3%83%96

   clog(全称Commit Log,PostgreSQL transaction-commit-log manager,主要在clog.c中实现)里面记录了事务的执行状态,每次事务提交和回滚的时候,都需要更新该状态(调用CommitTransactionCommand(void)),PostgreSQL服务器访问该文件确定事务的状态,保存在pg_xact目录中,每个文件大小为256KB,每个事务2位(bit),故1个文件可以包含131072个事务。对于第一次修改的数据行来说,因为事务状态存储在clog中,所以修改后第一次判断行的可见性需要通过访问clog来确定,而访问clog是一个非常耗费性能的过程,故关于clog访问优化,有一个很长的discussion。

  事务在clog中的状态包括:

/*
 * Possible transaction statuses --- note that all-zeroes is the initial
 * state.
 *
 * A "subcommitted" transaction is a committed subtransaction whose parent
 * hasn't committed or aborted yet.
 */
typedef int XidStatus;

#define TRANSACTION_STATUS_IN_PROGRESS        0x00
#define TRANSACTION_STATUS_COMMITTED        0x01
#define TRANSACTION_STATUS_ABORTED            0x02
#define TRANSACTION_STATUS_SUB_COMMITTED    0x03

  在clog.c中。

  因为pg的MVCC在文件中实现undo,即使事务回滚了,新创建的行也不会被删除,但是因为clog中记录了事务的执行状态,所以其他事务在xmin和xmax判断时候可以过滤掉或不过滤掉这些记录(主要是xmax=0的情况,因为此时可能提交了、也可能稍微提交)。

pg_xact(9.6及之前名为pg_clog,虽然代码中还是clog.c)

[postgres@hs-10-20-30-194 pg_xact]$ ll
total 13208
-rw------- 1 postgres postgres 262144 May 24 17:26 0000
-rw------- 1 postgres postgres 262144 May 24 17:26 0001
-rw------- 1 postgres postgres 262144 May 24 17:27 0002
-rw------- 1 postgres postgres 262144 May 24 17:27 0003
-rw------- 1 postgres postgres 262144 May 24 17:27 0004
-rw------- 1 postgres postgres 262144 May 24 17:28 0005
-rw------- 1 postgres postgres 262144 May 24 17:28 0006
-rw------- 1 postgres postgres 262144 May 24 17:28 0007

  clog和wal的交互:这得先理解事务的完整过程。

  postgresql中clog(commit log)内部实现、及与wal、commit的关系及细节

  在AM层,调用xlog相关接口将WAL条目写入WAL文件,PortalDrop清理执行完成后,主入口exec_simple_query()->finish_xact_command()会依次调用CommitTransactionCommand()->CommitTransaction()->RecordTransactionCommit()->XactLogCommitRecord()调用XLogInsert()将commit wal条目写入WAL文件,然后RecordTransactionCommit()调用XLogFlush刷新commit WAL日志,然后调用TransactionIdCommitTree()更新clog。TransactionIdCommitTree->TransactionIdSetTreeStatus->TransactionIdSetPageStatus->TransactionIdSetPageStatusInternal,然后根据pageno找到slotno(使用slru简单最近最少访问算法管理),调用TransactionIdSetStatusBit(其根据xid找到偏移量,然后进行位运算更新事务状态)

 * Note: because TransactionIds are 32 bits and wrap around at 0xFFFFFFFF,
 * CLOG page numbering also wraps around at 0xFFFFFFFF/CLOG_XACTS_PER_PAGE,
 * and CLOG segment numbering at
 * 0xFFFFFFFF/CLOG_XACTS_PER_PAGE/SLRU_PAGES_PER_SEGMENT.  We need take no
 * explicit notice of that fact in this module, except when comparing segment
 * and page numbers in TruncateCLOG (see CLOGPagePrecedes).
 */

/* We need two bits per xact, so four xacts fit in a byte */
#define CLOG_BITS_PER_XACT    2
#define CLOG_XACTS_PER_BYTE 4   每字节包含的事务数
#define CLOG_XACTS_PER_PAGE (BLCKSZ * CLOG_XACTS_PER_BYTE)    每BLOCK包含的事务数,32768
#define CLOG_XACT_BITMASK    ((1 << CLOG_BITS_PER_XACT) - 1)   0x11

#define TransactionIdToPage(xid)    ((xid) / (TransactionId) CLOG_XACTS_PER_PAGE)    根据事务ID找到页,事务ID 整除 32768
#define TransactionIdToPgIndex(xid) ((xid) % (TransactionId) CLOG_XACTS_PER_PAGE)    页内事务相对顺序号偏移量 事务ID 取余 32768
#define TransactionIdToByte(xid)    (TransactionIdToPgIndex(xid) / CLOG_XACTS_PER_BYTE)  页内字节偏移量
#define TransactionIdToBIndex(xid)    ((xid) % (TransactionId) CLOG_XACTS_PER_BYTE)     字节内事务相对顺序号偏移量  事务ID 取余 4
postgresql中clog(commit log)内部实现、及与wal、commit的关系及细节

 

 /* We store the latest async LSN for each group of transactions *

#define CLOG_XACTS_PER_LSN_GROUP 32 /* keep this a power of 2 */
#define CLOG_LSNS_PER_PAGE (CLOG_XACTS_PER_PAGE / CLOG_XACTS_PER_LSN_GROUP)

#define GetLSNIndex(slotno, xid) ((slotno) * CLOG_LSNS_PER_PAGE + \
((xid) % (TransactionId) CLOG_XACTS_PER_PAGE) / CLOG_XACTS_PER_LSN_GROUP)

  因为更新clog是内存中进行的,不会刷盘,那问题来了。1、重启恢复的时候哪里会用到?2、判断元祖可见性的时候哪里会调用到?

  所有的元祖在被fetch时,都会检查xmin、xmax是否已经提交,如果infomask_2上没有标记的话,就回去clog缓存区查询,如下:

TransactionIdGetStatus clog.c:654
TransactionLogFetch transam.c:79
TransactionIdDidCommit transam.c:129
HeapTupleSatisfiesMVCC heapam_visibility.c:1058
HeapTupleSatisfiesVisibility heapam_visibility.c:1695
heapgetpage heapam.c:476
heapgettup_pagemode heapam.c:917
heap_getnextslot heapam.c:1390
table_scan_getnextslot tableam.h:906
SeqNext nodeSeqscan.c:80
ExecScanFetch execScan.c:133
ExecScan execScan.c:182
ExecSeqScan nodeSeqscan.c:112
ExecProcNodeFirst execProcnode.c:454
ExecProcNode executor.h:248
ExecutePlan execMain.c:1632
standard_ExecutorRun execMain.c:350
CitusExecutorRun multi_executor.c:214
pgss_ExecutorRun pg_stat_statements.c:1043
pgsk_ExecutorRun pg_stat_kcache.c:1034
pgqs_ExecutorRun pg_qualstats.c:661
explain_ExecutorRun auto_explain.c:334
ExecutorRun execMain.c:292
PortalRunSelect pquery.c:912
PortalRun pquery.c:756
exec_simple_query postgres.c:1325
PostgresMain postgres.c:4415
BackendRun postmaster.c:4527
BackendStartup postmaster.c:4211
ServerLoop postmaster.c:1740
PostmasterMain postmaster.c:1413
main main.c:231
__libc_start_main 0x00007f3353efd555
_start 0x0000000000483aa9

 

/*
 * information stored in t_infomask:
 */
#define HEAP_HASNULL            0x0001    /* has null attribute(s) */
#define HEAP_HASVARWIDTH        0x0002    /* has variable-width attribute(s) */
#define HEAP_HASEXTERNAL        0x0004    /* has external stored attribute(s) */
#define HEAP_HASOID_OLD            0x0008    /* has an object-id field */
#define HEAP_XMAX_KEYSHR_LOCK    0x0010    /* xmax is a key-shared locker */
#define HEAP_COMBOCID            0x0020    /* t_cid is a combo cid */
#define HEAP_XMAX_EXCL_LOCK        0x0040    /* xmax is exclusive locker */
#define HEAP_XMAX_LOCK_ONLY        0x0080    /* xmax, if valid, is only a locker */

 /* xmax is a shared locker */
#define HEAP_XMAX_SHR_LOCK    (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)

#define HEAP_LOCK_MASK    (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
                         HEAP_XMAX_KEYSHR_LOCK)
#define HEAP_XMIN_COMMITTED        0x0100    /* t_xmin committed */
#define HEAP_XMIN_INVALID        0x0200    /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN        (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
#define HEAP_XMAX_COMMITTED        0x0400    /* t_xmax committed */
#define HEAP_XMAX_INVALID        0x0800    /* t_xmax invalid/aborted */
#define HEAP_XMAX_IS_MULTI        0x1000    /* t_xmax is a MultiXactId */
#define HEAP_UPDATED            0x2000    /* this is UPDATEd version of row */
#define HEAP_MOVED_OFF            0x4000    /* moved to another place by pre-9.0
                                         * VACUUM FULL; kept for binary
                                         * upgrade support */
#define HEAP_MOVED_IN            0x8000    /* moved from another place by pre-9.0
                                         * VACUUM FULL; kept for binary
                                         * upgrade support */
#define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)

#define HEAP_XACT_MASK            0xFFF0    /* visibility-related bits */

  每当一个新的clog页面(和pg中其他页面一样,也是BLCKSZ宏定义,默认8KB)被初始化为0的时候,clog.c就会生成一条wal记录。xact.c中针对提交和回滚操作的记录(recording)也会写clog。对于同步提交:在clog记录commit前,XLOG会确保被刷新,所以WAL可以自动被保证。对于异步提交:必须跟踪最新的LSN影响的每个CLOG页,这样才能刷新响应的xlog。clog的细节描述具体可以参见:https://www.interdb.jp/pg/pgsql05.html。clog的清理参见:https://www.interdb.jp/pg/pgsql06.html#_6.4.,由vacuum freeze负责清理。

postgresql中clog(commit log)内部实现、及与wal、commit的关系及细节

部分结构化描述可以参见https://blog.csdn.net/weixin_39540651/article/details/115677138。 

其他目录说明:

pg_logical

pg_commit_ts

pg_multixact

pg_subtrans

pg_snapshots

pg_replslot

pg_dynshmem

9.6目录说明

10.0+目录说明(到14为止未在发生调整) 

https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

相关文章:

  • 2022-12-23
  • 2021-08-06
  • 2021-04-18
  • 2021-05-01
  • 2022-12-23
  • 2022-12-23
  • 2021-07-12
猜你喜欢
  • 2022-12-23
  • 2021-09-06
  • 2021-10-29
  • 2022-01-04
  • 2021-06-01
  • 2022-12-23
  • 2021-06-16
相关资源
相似解决方案