【问题标题】:Comment threading with score factoring带有分数因子的评论线程
【发布时间】:2012-03-01 03:37:40
【问题描述】:

我正在用头撞什么东西,我想知道是否有比我更熟练的人可以帮助我。

我的目标是创建一个将评论评分系统考虑在内的评论线程。

首先我会解释一下我现在在哪里。

假设我们有一篇文章的评论线程,如下例所示。括号中的数字是该评论的 ID。 ID 由数据库自动分配,并随着发布的每个附加评论按时间顺序递增。评论文本前的破折号数代表评论深度。

(01)"This is a top level comment." 
(02)-"This is a second level comment. A reply to the top level comment above."
(06)-"This is also a second level comment / another reply to comment 01."
(07)--"This is a reply to comment 06."
(03)"This is a different top level comment."
(05)-"This is a reply to the comment above."
(08)--"This is a reply to that comment in turn."
(10)---"This is a deeper comment still."
(04)"This is one more top level comment."
(09)-"This is one more reply."

我的第一个问题是以一种可以按正确顺序返回的方式存储这些数据。如果您只是存储一个深度场并按深度排序,它将首先带回所有顶级 cmets,然后是第二级 cmets,等等。这是不对的,我们必须返回完整的 cmets 仍然完好无损。

实现此目的的一种方法是存储每条评论的完整父辈。

Comment ID  | Parentage
     01     |              (Comment 01 has no parent because it is top level)
     02     | 01-          (Comment 02 was a reply to comment 01)
     03     | 
     04     |              
     05     | 03-
     06     | 01-
     07     | 01-06-       (Comment 07 has two ancestors 01 and then 06)
     08     | 03-05-
     09     | 04-
     10     | 03-05-08-

添加另一个评论记录就像从您正在回复的评论中获取父辈一样简单,并附加其 ID 以形成新的父辈。例如,如果我要回复评论 10,我会采用它的父母身份 (03-05-08-) 并附加其 ID (10-)。数据库会自动将其识别为第 11 条评论,我们会得到:

Comment ID  | Parentage
     01     | 
     02     | 01- 
     03     | 
     04     |              
     05     | 03-
     06     | 01-
     07     | 01-06-
     08     | 03-05-
     09     | 04-
     10     | 03-05-08-
     11     | 03-05-08-10-

现在,当我们订购 cmets 进行展示时,我们订购的是 Parentage 和 Comment ID 的串联,这给了我们:

Order by CONCAT(Parentage, ID)

Comment ID  | Parentage    |   CONCAT(Parentage, ID)
     01     |              |   01-
     02     | 01-          |   01-02-
     06     | 01-          |   01-06-
     07     | 01-06-       |   01-06-07-
     03     |              |   03-
     05     | 03-          |   03-05-
     08     | 03-05-       |   03-05-08-
     10     | 03-05-08-    |   03-05-08-10-
     11     | 03-05-08-10- |   03-05-08-10-11-
     04     |              |   04-
     09     | 04-          |   04-09-

这会产生与第一次演示完全相同的列表。将我们后来添加的注释 11 插入正确的位置:

(01)"This is a top level comment." 
(02)-"This is a reply to the top level comment."
(06)-"This is another reply that was posted later than the first."
(07)--"This is a reply to the second level comment directly above."
(03)"This is a different top level comment."
(05)-"This is a reply to the comment above."
(08)--"This is a reply to the comment above."
(10)---"This is a deeper comment still."
(11)----"THIS COMMENT WAS ADDED IN THE EARLIER EXAMPLE."
(04)"This is one more top level comment."
(09)-"This is one more reply."

可以通过检查 CONCAT 字符串的长度并将 len(CONCAT(Parentage, ID)) 乘以设定的像素数来完成缩进。太好了,我们有一个存储 cmets 的系统,可以识别他们的出身。

现在的问题:

并非所有 cmets 都是平等的。需要一个评论评分系统来区分好的 cmets。假设每条评论都有一个点赞按钮。虽然我们想保留父母身份,但如果一条评论在同一级别有两个直接回复,那么我们希望首先显示点赞最多的那个。我将在下面的[方括号]中添加一些投票。

(01)"This is a top level comment." [6 votes]
(02)-"This is a reply to the top level comment." [2 votes]
(06)-"This is another reply that was posted later than the first." [30 votes]
(07)--"This is a reply to the second level comment directly above." [5 votes]
(03)"This is a different top level comment." [50 votes]
(05)-"This is a reply to the comment above." [4 votes]
(08)--"This is a reply to the comment above." [0 votes]
(10)---"This is a deeper comment still." [0 votes]
(11)----"THIS COMMENT WAS ADDED IN THE EARLIER EXAMPLE." [0 votes]
(04)"This is one more top level comment." [2 votes]
(09)-"This is one more reply." [0 votes]

在这个例子中,cmets (01) 和 (03) 都是顶级的,但是 (03) 有 [50 votes] 而 (01) 只有 [6 votes]。 (01) 出现在上面只是因为它是较早发布的,因此被分配了一个较小的 ID。同样,(02) 和 (06) 都是对 (01) 的回复,但必须重新排序以使得票最多的 (06) 上升到顶部。

我完全和完全地试图实现这一目标。

我想任何排序/重新排序和索引最好在评论投票而不是页面加载时完成,这样页面加载时间可以尽可能快,但除此之外我完全不知道!

您可以在可能的途径上提出的任何想法或启示都会真正减轻负担!一如既往地感谢您的帮助。

----------------------------------------------- ----------------------------------

编辑:针对@Paddy 的解决方案,

当我在模拟数据上运行下面@Paddy 提供的表达式时,我得到的第一个错误是:

"The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified." 

这可以通过将 SELECT 'top 100%' 添加到递归成员定义中来解决。完成后,我收到错误:

'CommentTree' has more columns than were specified in the column list.

这可以通过在 CommentTree 规范中添加“级别”列来解决。然后打印数据,但它首先返回所有顶级 cmets,然后返回类似于(但实际上不匹配)正确排序顺序的内容。

数据是这样返回的:

ParentId  |  CommentId  |  Comment  |  Vote  | Level
NULL      |      1      | Text here |   6    |  0
NULL      |      3      | Text here |   50   |  0     
NULL      |      4      | Text here |   2    |  0    
4         |      9      | Text here |   0    |  1    
3         |      5      | Text here |   4    |  1    
5         |      8      | Text here |   0    |  2    
8         |      10     | Text here |   0    |  3   
10        |      11     | Text here |   0    |  4    
1         |      2      | Text here |   2    |  1    
1         |      6      | Text here |   30   |  1     
6         |      7      | Text here |   5    |  2    

我做错了什么还是@Paddy 错过了什么?请接受我的道歉,递归函数对我来说很新。

【问题讨论】:

  • @Tudor 这是谷歌几乎不可能的主要原因!我不知道“多级评论结构的概念”比线程更好的词,有什么想法吗?
  • @Atheist for Paytheist - 如何将数据转换到前端?获得确切的顺序可能会很棘手,但我认为你拥有所有数据来为你的展示建立一个树。
  • 你不只是想要ORDER BY Parentage, Vote DESC, ID,还是我错过了什么?

标签: mysql sql sorting hierarchical-data comments


【解决方案1】:

下面的代码看起来很适合您的任务。这有点复杂,但在单个SELECT 中实现它对我来说是一个挑战。您可以将其拆分为多个 SELECT 并预取到临时表中(出于性能目的),或者将它们放在一起。

谢谢你的问题,很有趣!

请注意,根节点的ParentID 必须是0,而不是NULL

DECLARE @a TABLE (
    CommentID  INT,
    ParentID INT,
    Comment VARCHAR(100),
    Vote INT
)


INSERT @a
VALUES
    (1, 0, '', 6),
    (3, 0, '', 50),
    (4, 0, '', 2),
    (9, 4, '', 0),
    (5, 3, '', 4),
    (8, 5, '', 0),
    (10, 8, '', 0),
    (11, 10, '', 0),
    (2, 1, '', 2),
    (6, 1, '', 30),
    (7, 6, '', 5)

;WITH CTE_1 (ParentId, CommentId, Comment, Vote, Level, LevelPriority, Path)    -- prepare base info
AS
(
    SELECT c.ParentId, c.CommentId, c.Comment, c.Vote, 0 AS Level, ROW_NUMBER() OVER(ORDER BY c.Vote DESC), CAST('/' + CAST(c.CommentId AS VARCHAR(32)) AS VARCHAR(MAX)) + '/'
    FROM @a AS c
    WHERE ParentId = 0

    UNION ALL

    SELECT c.ParentId, c.CommentId, c.Comment, c.Vote, Level + 1 AS Level, ROW_NUMBER() OVER(ORDER BY c.Vote DESC), d.Path + CAST(c.CommentId AS VARCHAR(32)) + '/'
    FROM @a AS c
    INNER JOIN CTE_1 AS d
        ON c.ParentID = d.CommentID
),
CTE_2 (ParentId, CommentId, Comment, Vote, Level, LevelPriority, ChildCount)    -- count number of children
AS
(
    SELECT p.ParentId, p.CommentId, p.Comment, p.Vote, p.Level, p.LevelPriority, COUNT(*)
    FROM CTE_1 AS p
    INNER JOIN CTE_1 AS c
        ON c.Path LIKE p.Path + '%'
    GROUP BY 
        p.ParentId, p.CommentId, p.Comment, p.Vote, p.Level, p.LevelPriority
),
CTE_3 (ParentId, CommentId, Comment, Vote, Level, LevelPriority, OverAllPriority, ChildCount) -- calculate overall priorities
AS
(
    SELECT c.ParentId, c.CommentId, c.Comment, c.Vote, c.Level, c.LevelPriority, 1 AS OverAllPriority, ChildCount
    FROM CTE_2 AS c
    WHERE Level = 0 AND LevelPriority = 1

    UNION ALL

    SELECT c.ParentId, c.CommentId, c.Comment, c.Vote, c.Level, c.LevelPriority, 
        CASE 
            WHEN c.ParentID = d.CommentID THEN d.OverAllPriority + 1
            ELSE d.OverAllPriority + d.ChildCount
        END,
        c.ChildCount
    FROM CTE_2 AS c
    INNER JOIN CTE_3 AS d
        ON 
            (c.ParentID = d.CommentID AND c.LevelPriority = 1) 
            OR (c.ParentID = d.ParentID AND d.LevelPriority + 1 = c.LevelPriority)
)
SELECT ParentId, CommentId, Comment, Vote
FROM CTE_3
ORDER BY OverAllPriority

在此查询中,我执行以下操作:

  1. 在 CTE_1 中,我计算同一父评论中的排序位置(基于投票)并构建树路径以收集有关层次结构中所有节点的信息。
  2. 在 CTE_2 中,我计算属于每个节点 +1 的后代数量。树路径允许将所有级别的后代计数为一个SELECT
  3. 在 CTE_3 中,我根据 3 个简单的规则计算总体排序位置:
    1. 最上面一行有position = 1
    2. 上面的子节点有position = parent_position + 1
    3. 下一个兄弟姐妹应该在前一个兄弟姐妹的所有后代之后,并且有position = prev_sibling_position + prev_sibling_number_of_descendants

编辑相同的解决方案,但没有 CTE。

DECLARE @a TABLE (
    CommentID  INT,
    ParentID INT,
    Comment VARCHAR(100),
    Vote INT
)

INSERT @a
VALUES
    (1, 0, '', 6),
    (3, 0, '', 50),
    (4, 0, '', 2),
    (9, 4, '', 0),
    (5, 3, '', 4),
    (8, 5, '', 0),
    (10, 8, '', 0),
    (11, 10, '', 0),
    (2, 1, '', 2),
    (6, 1, '', 30),
    (7, 6, '', 5)


DECLARE @rows INT

DECLARE @temp_table TABLE (
    CommentID  INT,
    ParentID INT,
    Comment VARCHAR(100),
    Vote INT,
    LevelPriority INT, 
    Path VARCHAR(MAX),
    ChildCount INT NULL,
    OverAllPriority INT NULL
)

INSERT @temp_table (CommentID, ParentID, Comment, Vote, LevelPriority, Path)
SELECT CommentID, ParentID, Comment, Vote, ROW_NUMBER() OVER(ORDER BY Vote DESC), '/' + CAST(CommentId AS VARCHAR(32)) + '/'
FROM @a
WHERE ParentID = 0

SELECT @rows = @@ROWCOUNT

WHILE @rows > 0
BEGIN

    INSERT @temp_table (CommentID, ParentID, Comment, Vote, LevelPriority, Path)
    SELECT a.CommentID, a.ParentID, a.Comment, a.Vote, ROW_NUMBER() OVER(PARTITION BY a.ParentID ORDER BY a.Vote DESC), c.Path + CAST(a.CommentId AS VARCHAR(32)) + '/'
    FROM @a AS a
    INNER JOIN @temp_table AS c
        ON a.ParentID = c.CommentID
    WHERE NOT
        a.CommentID IN (SELECT CommentID FROM @temp_table)  

    SELECT @rows = @@ROWCOUNT
END

UPDATE c
SET ChildCount = a.cnt
FROM (
    SELECT p.CommentID, COUNT(*) AS cnt 
    FROM @temp_table AS p
    INNER JOIN @temp_table AS c
        ON c.Path LIKE p.Path + '%'
    GROUP BY 
        p.CommentID
) AS a
INNER JOIN @temp_table AS c
    ON a.CommentID = c.CommentID

UPDATE @temp_table
SET OverAllPriority = 1
WHERE ParentID = 0 AND LevelPriority = 1

SELECT @rows = @@ROWCOUNT

WHILE @rows > 0
BEGIN

    UPDATE c
    SET 
        OverAllPriority = CASE 
            WHEN c.ParentID = p.CommentID THEN p.OverAllPriority + 1
            ELSE p.OverAllPriority + p.ChildCount
        END
    FROM @temp_table AS p
    INNER JOIN @temp_table AS c
        ON (c.ParentID = p.CommentID AND c.LevelPriority = 1) 
            OR (p.ParentID = c.ParentID AND p.LevelPriority + 1 = c.LevelPriority)
    WHERE
        c.OverAllPriority IS NULL  
        AND p.OverAllPriority IS NOT NULL

    SELECT @rows = @@ROWCOUNT
END


SELECT * FROM @temp_table 
ORDER BY OverAllPriority

【讨论】:

  • 这是 MS SQL 2008+。我相信语法应该足够接近。我添加了一些描述查询逻辑的 cmets
  • 我明白了。这可以帮助您转换查询stackoverflow.com/questions/1382573/…
  • 这似乎是相当多的工作。很高兴知道这在性能方面有多好,原因很明显..
  • 它对大桌子效率不高。给我几分钟,我会重写这个。不幸的是我在 mysql 方面不太好,所以我会尽量让它尽可能简单。
  • 我已经更新了答案。现在没有 CTE,代码更接近 mysql。至于性能,有两个循环和INNER JOIN ON LIKE,所以可能会有一些问题。但它们只能在实际系统中解决,而不是在理论系统中解决。例如,如果您在表中预先计算了树路径或后代数量,则查询会变得更简单。
【解决方案2】:

虽然与您的问题没有直接关系,但我的建议是更改为Nested Set Model。我知道这是很多返工,但迟早你会意识到这是最好的选择:)

【讨论】:

  • 虽然我不会将此标记为正确答案,因为我指定了 SQL 并且 Stack Overflow 对此有明确的指导方针,但您的解决方案实际上是我最终采取的路线,您再正确不过了我的认识。任何试图解决这个问题的人最终都会意识到莫斯蒂是明确正确的。
【解决方案3】:

使用类似这样的表定义(自引用键):

Comment ID  |   Parent ID    |   Comment    |  Vote

然后您可以使用递归公用表表达式(在 MS SQL 中)来获取结果:

WITH CommentTree (ParentId, CommentId, Comment, Vote)
AS
(
-- Anchor member definition
    SELECT c.ParentId, c.CommentId, c.Comment, c.Vote,
        0 AS Level
    FROM dbo.Comments AS c
    WHERE ParentId IS NULL
    UNION ALL
-- Recursive member definition
    SELECT c.ParentId, c.CommentId, c.Comment, c.Vote,
        Level + 1 AS Level
    FROM dbo.Comments AS c
    INNER JOIN CommentTree AS d
        ON c.ParentID = d.CommentID
    Order by C.Vote
)
SELECT ParentId, CommentId, Comment, Vote FROM CommentTree

CTE 参考:

http://msdn.microsoft.com/en-us/library/ms186243.aspx

【讨论】:

  • 嗨@Paddy,感谢您的帮助-我按照您的建议进行了尝试,并根据结果编辑了我的原始评论:)
猜你喜欢
  • 1970-01-01
  • 2016-01-24
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-01-09
  • 1970-01-01
  • 2020-07-07
  • 2016-12-02
相关资源
最近更新 更多