【问题标题】:Efficient atomic bulk refresh operation in MariaDB for many to many relationMariaDB 中针对多对多关系的高效原子批量刷新操作
【发布时间】:2020-09-16 22:30:50
【问题描述】:

我需要通过page_xref_page_provider 表在page_providers 表中以多对多关系链接pages 表中的唯一页面列表。我很难设计一个有效的原子批量刷新操作,包括以下内容:

  1. 从页面提供程序接收到新的页面列表。该列表中的某些页面可能与数据库中已记录的页面相同(具有相同的Url),而某些页面可能会从列表中删除,而某些页面可能会被添加。
  2. 数据库中有一些每页的统计信息,所以如果至少有一个页面提供者的页面仍在列表中,我不应该删除旧页面(由唯一的Url 标识)。
  3. 如果来自当前页面提供程序的更新列表不包含之前包含的页面,并且没有其他页面列表提供程序在其列表中包含此页面,则应从pages 表中删除该页面。
  4. 在我收到页面列表时尚未记录的页面必须添加到pages 表并在page_xref_page_provider 中交叉引用

我尝试过的:

-- We use IGNORE to handle duplicate URLs on the list we received from the current page provider
-- pages_temp is a temporary table whose creation I have omitted
INSERT IGNORE INTO pages_temp (Url, Host, Port) VALUES (?, ?, ?);

BEGIN;

-- In the DB client program, we get the last inserted ID from the following query and the number of
--   rows affected, so to get a range of newly inserted IDs
INSERT IGNORE INTO pages (Url, Host, Port) SELECT Url, Host, Port FROM pages_temp;

-- This doesn't work (wrong syntax), could you correct me here?
-- When preparing this statement, we parameterize it with the current PageProviderID, the
--   last inserted ID (which is actually the first ID in the bulk) and the number of rows inserted
--   plus the first ID in the bulk.
INSERT INTO page_xref_page_provider (PageProviderID, PageID) SELECT ?, i BETWEEN ? AND ?;

-- This query is parametrized with the current page provider ID
DELETE page_xref_page_provider FROM page_xref_page_provider AS pxpp
JOIN pages ON pxpp.PageID = pages.ID AND pxpp.PageProviderID=?
WHERE pages.Url NOT IN (SELECT Url FROM pages_temp);

-- This seems inefficient because the subquery also fetch the relations not affected by the current
--   list of pages / page provider
DELETE FROM pages WHERE pages.ID NOT IN (SELECT DISTINCT PageID FROM page_xref_page_provider);

COMMIT;

【问题讨论】:

标签: mysql sql performance transactions mariadb


【解决方案1】:

避免NOT IN ( SELECT ... )。在某些情况下,它的性能很糟糕。 LEFT JOINEXISTS 可能会更快。

表格中有AUTO_INCREMENT id 吗?如果是这样,请注意IGNORE“正在燃烧的ID”。

这里是关于高速摄取技术的讨论:http://mysql.rjweb.org/doc.php/staging_table

多对多表的性能提示:http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2011-12-08
    • 1970-01-01
    • 2020-09-02
    • 2021-05-13
    • 2013-06-17
    • 2021-03-09
    • 2017-08-04
    • 1970-01-01
    相关资源
    最近更新 更多