【发布时间】:2020-09-16 22:30:50
【问题描述】:
我需要通过page_xref_page_provider 表在page_providers 表中以多对多关系链接pages 表中的唯一页面列表。我很难设计一个有效的原子批量刷新操作,包括以下内容:
- 从页面提供程序接收到新的页面列表。该列表中的某些页面可能与数据库中已记录的页面相同(具有相同的
Url),而某些页面可能会从列表中删除,而某些页面可能会被添加。 - 数据库中有一些每页的统计信息,所以如果至少有一个页面提供者的页面仍在列表中,我不应该删除旧页面(由唯一的
Url标识)。 - 如果来自当前页面提供程序的更新列表不包含之前包含的页面,并且没有其他页面列表提供程序在其列表中包含此页面,则应从
pages表中删除该页面。 - 在我收到页面列表时尚未记录的页面必须添加到
pages表并在page_xref_page_provider中交叉引用
我尝试过的:
-- We use IGNORE to handle duplicate URLs on the list we received from the current page provider
-- pages_temp is a temporary table whose creation I have omitted
INSERT IGNORE INTO pages_temp (Url, Host, Port) VALUES (?, ?, ?);
BEGIN;
-- In the DB client program, we get the last inserted ID from the following query and the number of
-- rows affected, so to get a range of newly inserted IDs
INSERT IGNORE INTO pages (Url, Host, Port) SELECT Url, Host, Port FROM pages_temp;
-- This doesn't work (wrong syntax), could you correct me here?
-- When preparing this statement, we parameterize it with the current PageProviderID, the
-- last inserted ID (which is actually the first ID in the bulk) and the number of rows inserted
-- plus the first ID in the bulk.
INSERT INTO page_xref_page_provider (PageProviderID, PageID) SELECT ?, i BETWEEN ? AND ?;
-- This query is parametrized with the current page provider ID
DELETE page_xref_page_provider FROM page_xref_page_provider AS pxpp
JOIN pages ON pxpp.PageID = pages.ID AND pxpp.PageProviderID=?
WHERE pages.Url NOT IN (SELECT Url FROM pages_temp);
-- This seems inefficient because the subquery also fetch the relations not affected by the current
-- list of pages / page provider
DELETE FROM pages WHERE pages.ID NOT IN (SELECT DISTINCT PageID FROM page_xref_page_provider);
COMMIT;
【问题讨论】:
-
这可能是DBA Exchange 上一个更好的问题。
标签: mysql sql performance transactions mariadb