【问题标题】:Column based intersect on two tables两个表上基于列的相交
【发布时间】:2016-02-13 01:04:55
【问题描述】:

我正在尝试在两个表上执行类似于基于列的相交的操作。 这些表格是:

  • LogTag: 一个日志可以有零个或多个标签
  • MatchingRule:匹配规则由一个或多个定义规则的标签组成

一个日志可以有零个或多个与之匹配的规则。我将传入 MatchingRuleID 并期望返回所有符合该规则的日志。

预期结果:匹配LogIDs 的结果集。例如。传入MatchingRuleID = 30 应该返回LogID 101。MatchingRuleID = 31 应该返回LogID 101 & 100。

此外,LogTag 表可能有数百万行,因此首选高效查询。

问题:如何找到所有符合指定规则定义的LogIDs?

架构:

CREATE TABLE dbo.Tag
(
    TagID INT,
    TagName NVARCHAR(50)
)
INSERT INTO dbo.Tag (TagID, TagName)
VALUES (1, 'tag1'), (2, 'tag2'), (3, 'tag3')

CREATE TABLE dbo.LogTag
(
    LogID INT,
    TagID INT
)
INSERT INTO dbo.LogTag (LogID, TagID)
VALUES (100, 1), (101, 1), (101, 2), (101, 3), (101, 4), (102, 2), (102, 3)  

CREATE TABLE dbo.MatchingRule
(
    MatchingRuleID INT,
    TagID INT
)
INSERT INTO dbo.MatchingRule (MatchingRuleID, TagID)
VALUES (30, 1), (30, 2), (30, 3), (31, 1)

【问题讨论】:

  • 预期结果是什么?
  • @Squirrel,编辑澄清
  • 预期结果在哪里?

标签: sql-server tsql sql-server-2012


【解决方案1】:

在表上拥有适当的聚集索引很重要。我在 cmets 中为#log_tag 放置了一个替代索引,这可能会提高大型集合的性能。由于我没有合适的样本进行测试,因此您必须验证哪个是最好的。

CREATE TABLE #tag(tag_id INT PRIMARY KEY,tag_name NVARCHAR(50));
INSERT INTO #tag (tag_id,tag_name)VALUES
    (1,'tag1'),(2,'tag2'),(3,'tag3');

-- Try this key for large sets: PRIMARY KEY(tag_id,log_id));
CREATE TABLE #log_tag(log_id INT,tag_id INT,PRIMARY KEY(log_id,tag_id))
INSERT INTO #log_tag (log_id,tag_id)VALUES
    (100,1),(101,1),(101,2),(101,3),(101,4),(102,2),(102,3);

CREATE TABLE #matching_rule(matching_rule_id INT,tag_id INT,PRIMARY KEY(matching_rule_id,tag_id));
INSERT INTO #matching_rule(matching_rule_id,tag_id)VALUES
    (30,1),(30,2),(30,3),(31,1);

DECLARE @matching_rule_id INT=31;

;WITH required_tags AS (
    SELECT tag_id
    FROM #matching_rule
    WHERE matching_rule_id=@matching_rule_id
)
SELECT lt.log_id
FROM required_tags AS rt 
     INNER JOIN #log_tag AS lt ON
         lt.tag_id=rt.tag_id
GROUP BY lt.log_id
HAVING COUNT(*)=(SELECT COUNT(*) FROM required_tags);

DROP TABLE #log_tag;
DROP TABLE #matching_rule;
DROP TABLE #tag;

结果是 30 和 31 的 预期结果 中的结果。

脚本中使用的索引的执行计划:

【讨论】:

    【解决方案2】:

    试试这个查询

    Fiddle Here

    DECLARE @InputMatchingRuleId  INT = 30
    ;WITH CTE1
    AS
    (
        SELECT DENSE_RANK() OVER(ORDER BY LT.TAGID) AS RN,LT.TagID,LT.LOGID 
        FROM MatchingRule MR INNER JOIN LogTag LT ON LT.TagID = MR.TagID 
        WHERE MatchingRuleID=@InputMatchingRuleId
    
    ),
    CTE2
    AS
    (
        SELECT 1 AS RN2,LOGID FROM CTE1 C1 WHERE C1.RN=1
        UNION ALL
        SELECT RN2+1 as RN2,C2.LOGID 
        FROM CTE1 C1 INNER JOIN CTE2 C2 ON C1.RN = C2.RN2+1 AND C1.LOGID = C2.LOGID
    )
    
      SELECT DISTINCT LOGID FROM CTE2 
      WHERE RN2>(CASE WHEN (SELECT MAX(RN2) FROM CTE2)=1 THEN 0 ELSE 1 END)
    

    【讨论】:

      【解决方案3】:

      注意:这仅适用于 SQL Server 2008+

      这是我想出的查询:

      DECLARE @RuleID INT
      SELECT @RuleID = 30
      
      SELECT LogID
      FROM LogTag lt
          INNER JOIN (
              SELECT TagID, MatchingRuleID, COUNT(*) OVER (PARTITION BY MatchingRuleID) TagCount
              FROM MatchingRule
          ) mr 
          ON lt.TagID = mr.TagID
              AND mr.MatchingRuleID = @RuleID
      GROUP BY LogID, TagCount
      HAVING COUNT(*) = TagCount
      

      所以基本上我在指定的匹配规则中匹配所有TagID,然后一旦我知道所有标签都匹配,我检查MatchingRule表中的标签计数是否匹配(现在过滤和分组) LogTag 表中的标签计数。

      【讨论】:

        【解决方案4】:

        应该是

        ; with rules as
        (
            select  TagID, cnt = sum(count(*)) over()
            from    dbo.MatchingRule
            where   MatchingRuleID  = @MatchingRuleID
            group by TagID
        )
        select  LogID
        from    rules r
            inner join LogTag lt    on  r.TagID = lt.TagID
        group by LogID, cnt
        having  count(*) = r.cnt
        

        【讨论】:

          【解决方案5】:
          select l.LogID
          from dbo.MatchingRule r
          inner join dbo.LogTag l on l.TagID = r.TagID
          where r.MatchingRuleID = 31
          

          另一种方法是识别所有标签,然后:

          select l.LogID
          from dbo.LogTag l
          where exists(select 1 from @Tags t where t.TagID = l.TagID)
          

          【讨论】:

            猜你喜欢
            • 2023-03-25
            • 1970-01-01
            • 2011-11-03
            • 1970-01-01
            • 1970-01-01
            • 1970-01-01
            • 2021-06-26
            • 1970-01-01
            • 2011-06-03
            相关资源
            最近更新 更多