【发布时间】:2021-07-27 06:12:49
【问题描述】:
我正在尝试识别违反 Redshift 上可序列化隔离的事务 例如
ERROR: 1023
DETAIL: Serializable isolation violation on table - 4117431, transactions forming the cycle are: 246544535, 246540473 (pid:1777)
为了更好地理解这一点,我在这里玩了 AWS 文档中的玩具示例:https://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html#c_serial_isolation-serializable-isolation-troubleshooting
错误消息似乎包含一个不是我当前正在运行的并发事务之一的事务 ID。我是不是误会了什么?
我做了 2 个实验来证实这一点:
实验 1
事务 1 (T1) - 用户:user_a
mydb=> begin;
BEGIN
mydb=*> select * from test.sl;
id
----
1
3
7
2
(4 rows)
mydb=*> insert into test.sl2 values (7);
INSERT 0 1
mydb=*> end;
COMMIT
事务 2 (T2) - 用户:user_b
mydb=# begin;
BEGIN
mydb=*# select * from test.sl2;
id
----
11
3
9
8
(4 rows)
mydb=*# insert into test.sl values (6);
ERROR: 1023
DETAIL: Serializable isolation violation on table - 4117431, transactions forming the cycle are: 246544535, 246540473 (pid:1777)
mydb=!# end;
调试
mydb=# select xid,
pid,
starttime,
endtime,
sequence,
case
when xid in (select xact_id from stl_tr_conflict) then 1
else 0
end as aborted,
trim(text) as text
from svl_statementtext
where xid in (246544535, 246540473) order by xid, sequence, starttime;
xid | pid | starttime | endtime | sequence | aborted | text
-----------+-------+----------------------------+----------------------------+----------+---------+-----------------------------------
246540473 | 31342 | 2021-07-26 10:02:35.975449 | 2021-07-26 10:02:35.975451 | 0 | 0 | begin;
246540473 | 31342 | 2021-07-26 10:02:40.219189 | 2021-07-26 10:02:40.713895 | 0 | 0 | select * from test.sl;
246540473 | 31342 | 2021-07-26 10:03:02.616113 | 2021-07-26 10:03:02.628287 | 0 | 0 | insert into test.sl2 values (11);
246540473 | 31342 | 2021-07-26 10:03:32.585407 | 2021-07-26 10:03:33.036425 | 0 | 0 | COMMIT
246544535 | 1777 | 2021-07-26 10:14:40.687421 | 2021-07-26 10:14:40.687423 | 0 | 1 | begin;
246544535 | 1777 | 2021-07-26 10:15:46.711658 | 2021-07-26 10:15:46.71843 | 0 | 1 | select * from test.sl2;
246544535 | 1777 | 2021-07-26 10:16:03.639541 | 2021-07-26 10:16:03.6423 | 0 | 1 | insert into test.sl values (6);
(7 rows)
我已经看到 xid = 246540473 不是并发事务之一(T1 或 T2)。
于是我又测试了一遍。
实验 2
T1 - 用户:user_a
mydb=> begin;
BEGIN
mydb=*> select * from test.sl;
id
----
2
1
3
7
(4 rows)
mydb=*> insert into test.sl2 values (12);
INSERT 0 1
mydb=*>
T2 - 用户:user_b
mydb=# begin;
BEGIN
mydb=*# select * from test.sl2;
id
----
8
3
9
11
7
(5 rows)
mydb=*# insert into test.sl values (13);
ERROR: 1023
DETAIL: Serializable isolation violation on table - 4117431, transactions forming the cycle are: 246549376, 246544529 (pid:6733)
mydb=!#
不过,这一次,我通过查询 svv_transactions 并查找 txn_owner 来记录交易 ID,然后才结束这两个交易。
mydb=# select * from svv_transactions where txn_owner in ('user_b', 'user_a') limit 10;
txn_owner | txn_db | xid | pid | txn_start | lock_mode | lockable_object_type | relation | granted
-----------+---------+-----------+------+----------------------------+-----------------+----------------------+----------+---------
user_a | mydb | 246549373 | 6727 | 2021-07-26 10:46:20.116482 | AccessShareLock | relation | 252024 | t
user_a | mydb | 246549373 | 6727 | 2021-07-26 10:46:20.116482 | AccessShareLock | relation | 4117431 | t
user_a | mydb | 246549373 | 6727 | 2021-07-26 10:46:20.116482 | ExclusiveLock | transactionid | | t
user_b | mydb | 246549376 | 6733 | 2021-07-26 10:46:23.702597 | AccessShareLock | relation | 252024 | t
user_b | mydb | 246549376 | 6733 | 2021-07-26 10:46:23.702597 | AccessShareLock | relation | 4117498 | t
user_b | mydb | 246549376 | 6733 | 2021-07-26 10:46:23.702597 | ExclusiveLock | transactionid | | t
我看到实验 2 中的事务 ID 是 246549373 和 246549376。
错误消息为我提供了246549376,这是有道理的。
但是第二个 id 246544529 没有。 -- 来自实验 1。
mydb=# select xid,
pid,
starttime,
endtime,
sequence,
case
when xid in (select xact_id from stl_tr_conflict) then 1
else 0
end as aborted,
trim(text) as text
from svl_statementtext
where xid in (246549376, 246544529, 246549373)
order by xid, starttime, sequence;
xid | pid | starttime | endtime | sequence | aborted | text
-----------+------+----------------------------+----------------------------+----------+---------+-----------------------------------
246544529 | 1779 | 2021-07-26 10:14:37.052255 | 2021-07-26 10:14:37.052257 | 0 | 0 | begin;
246544529 | 1779 | 2021-07-26 10:15:43.173474 | 2021-07-26 10:15:43.185421 | 0 | 0 | select * from test.sl;
246544529 | 1779 | 2021-07-26 10:15:56.973818 | 2021-07-26 10:15:56.986552 | 0 | 0 | insert into test.sl2 values (7);
246544529 | 1779 | 2021-07-26 10:16:42.137115 | 2021-07-26 10:16:42.674209 | 0 | 0 | COMMIT
246549373 | 6727 | 2021-07-26 10:44:37.179593 | 2021-07-26 10:44:37.179594 | 0 | 0 | begin;
246549373 | 6727 | 2021-07-26 10:46:20.119846 | 2021-07-26 10:46:20.352005 | 0 | 0 | select * from test.sl;
246549373 | 6727 | 2021-07-26 10:47:00.662191 | 2021-07-26 10:47:00.674989 | 0 | 0 | insert into test.sl2 values (12);
246549376 | 6733 | 2021-07-26 10:44:38.798094 | 2021-07-26 10:44:38.798095 | 0 | 1 | begin;
246549376 | 6733 | 2021-07-26 10:46:23.705674 | 2021-07-26 10:46:23.715201 | 0 | 1 | select * from test.sl2;
246549376 | 6733 | 2021-07-26 10:47:07.167762 | 2021-07-26 10:47:07.17054 | 0 | 1 | insert into test.sl values (13);
(10 rows)
为什么它不向我提供246549373?我不明白什么?
参考资料:
【问题讨论】:
标签: database concurrency amazon-redshift