【问题标题】:Complex SQL query optimization复杂的 SQL 查询优化
【发布时间】:2011-05-26 13:13:24
【问题描述】:

我正在尝试优化 SQL 查询。你能帮帮我吗?

基本上每个用户都通过友谊表有朋友,每个用户通过 user_feed_events 表有许多 feed_events。 我正在尝试列出给定用户的朋友的 feed_events。应该不是不可能吧? :)

如您所见,查询的性能取决于用户有多少朋友。现在,一个有 150 个朋友的用户需要将近 7 秒的时间来执行。

更新:这是我的友谊表的构建方式:

create_table "friendships", :force => true do |t|
t.integer  "user_id",     :null => false
t.integer  "friend_id",   :null => false
t.datetime "created_at"
t.datetime "accepted_at"
end

add_index "friendships", ["friend_id"], :name => "index_friendships_on_friend_id"
add_index "friendships", ["user_id"], :name => "index_friendships_on_user_id"

首先我让rails给我用户朋友的用户ID的ID列表,然后我在真正的查询中使用这个字符串。

friends_id = current_user.friends.collect {|f| f.id}.join(",")

sql = "
SELECT 
DISTINCT feed_events.id, 
feed_events.event_type, 
feed_events.type_id, 
feed_events.data, 
feed_events.created_at, 
feed_events.updated_at, 
user_feed_events.user_id  
FROM feed_events 
LEFT JOIN user_feed_events 
ON feed_events.id = user_feed_events.feed_event_id 
WHERE user_feed_events.user_id IN (#{friends_id}) 
ORDER BY feed_events.created_at DESC"

然后我执行查询(分页并限制为 30 个结果):

@events = FeedEvent.paginate_by_sql(sql, :page => params[:page], :per_page => 30)

更新 #2: 这是解释分析输出:

    SQL> EXPLAIN ANALYZE (SELECT  DISTINCT feed_events.id,  feed_events.event_type,  feed_events.type_id,  feed_events.data,  feed_events.created_at,  feed_events.updated_at,  user_feed_events.user_id   FROM user_feed_events  INNER JOIN feed_events  ON feed_events.id = user_feed_events.feed_event_id  WHERE user_feed_events.user_id IN (1,7,9,8,14,15,20,35,40,39,41,42,57,84,98,109,121,74,129,64,137,77,172,182,206,201,284,31,94,232,311,168,30,114,50,174,419,403,438,464,423,513,351,349,385,622,751,359,809,838,844,962,831,786,896,1001,992,998,990,256,67,623,957,1226,1060,1009,1490,132,1467,1672,619,1459,1466,993,1599,1365,607,1381,1714,1154,2032,2230,2240,2354,598,2345,1804,634,1900,2652,1975,2164,1759,3288,1004,3487,3507,3542,3566,514,3787,3137,3803,3090,4012,855,17,2026,1463,335,1000,935,5,12,10,13,19,18,16,22,34,27,29,59,126,90,46,23,63,291,134,229,107,439,521)  ORDER BY feed_events.created_at DESC)
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                                                                                                                                                                                                                                                                                                          QUERY PLAN                                                                                                                                                                                                                                                                                                          |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Unique  (cost=6090.87..6162.93 rows=18014 width=389) (actual time=1641.210..1733.010 rows=29691 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
    |   ->  Sort  (cost=6090.87..6099.88 rows=18014 width=389) (actual time=1641.206..1670.882 rows=29694 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
    |         Sort Key: feed_events.created_at, feed_events.id, feed_events.event_type, feed_events.type_id, feed_events.data, feed_events.updated_at, user_feed_events.user_id                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
    |         Sort Method:  quicksort  Memory: 17755kB                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
    |         ->  Hash Join  (cost=3931.63..5836.21 rows=18014 width=389) (actual time=258.541..361.345 rows=29694 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
    |               Hash Cond: (user_feed_events.feed_event_id = feed_events.id)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
    |               ->  Bitmap Heap Scan on user_feed_events  (cost=926.64..2745.66 rows=18014 width=8) (actual time=6.930..42.367 rows=29694 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
    |                     Recheck Cond: (user_id = ANY ('{1,7,9,8,14,15,20,35,40,39,41,42,57,84,98,109,121,74,129,64,137,77,172,182,206,201,284,31,94,232,311,168,30,114,50,174,419,403,438,464,423,513,351,349,385,622,751,359,809,838,844,962,831,786,896,1001,992,998,990,256,67,623,957,1226,1060,1009,1490,132,1467,1672,619,1459,1466,993,1599,1365,607,1381,1714,1154,2032,2230,2240,2354,598,2345,1804,634,1900,2652,1975,2164,1759,3288,1004,3487,3507,3542,3566,514,3787,3137,3803,3090,4012,855,17,2026,1463,335,1000,935,5,12,10,13,19,18,16,22,34,27,29,59,126,90,46,23,63,291,134,229,107,439,521}'::integer[]))     |
    |                     ->  Bitmap Index Scan on index_user_feed_events_on_user_id  (cost=0.00..925.74 rows=18014 width=0) (actual time=6.836..6.836 rows=29694 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
    |                           Index Cond: (user_id = ANY ('{1,7,9,8,14,15,20,35,40,39,41,42,57,84,98,109,121,74,129,64,137,77,172,182,206,201,284,31,94,232,311,168,30,114,50,174,419,403,438,464,423,513,351,349,385,622,751,359,809,838,844,962,831,786,896,1001,992,998,990,256,67,623,957,1226,1060,1009,1490,132,1467,1672,619,1459,1466,993,1599,1365,607,1381,1714,1154,2032,2230,2240,2354,598,2345,1804,634,1900,2652,1975,2164,1759,3288,1004,3487,3507,3542,3566,514,3787,3137,3803,3090,4012,855,17,2026,1463,335,1000,935,5,12,10,13,19,18,16,22,34,27,29,59,126,90,46,23,63,291,134,229,107,439,521}'::integer[])) |
    |               ->  Hash  (cost=2848.84..2848.84 rows=44614 width=385) (actual time=251.490..251.490 rows=44663 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
    |                     ->  Seq Scan on feed_events  (cost=0.00..2848.84 rows=44614 width=385) (actual time=0.035..77.044 rows=44663 loops=1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
    | Total runtime: 1780.200 ms                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    SQL>

更新 #3 :问题是我的 rails 应用程序使用 has_many_friends 插件 (https://github.com/swemoney/has_many_friends),它可以照顾我的友谊.它是这样工作的。我是 user_id #6,我向 user_id #10 询问友谊。当用户 #10 接受我的友谊时,表中会添加一个新行,其中 user_id = 6 和friend_id = 10。如果用户 #10 向我询问友谊行是:user_id = 10 和friend_id = 6。

这意味着为了找到friends_by_me我需要搜索“user_id = 6”,为了找到friends_for_me我需要“friend_id = 6”。为了找到我所有的朋友,我需要搜索这两列。这使得创建连接变得非常复杂! 您将如何处理?

我能想到的唯一选择是:

"(SELECT 
DISTINCT feed_events.id, 
feed_events.event_type, 
feed_events.type_id, 
feed_events.data, 
feed_events.created_at, 
feed_events.updated_at, 
user_feed_events.user_id 
FROM feed_events 
INNER JOIN user_feed_events 
ON feed_events.id = user_feed_events.feed_event_id 
INNER JOIN friendships 
ON user_feed_events.user_id = friendships.user_id 
WHERE friendships.user_id = 6 
AND friendships.accepted_at IS NOT NULL)

UNION DISTINCT

(SELECT 
DISTINCT additional_feed_events.id, 
additional_feed_events.event_type, 
additional_feed_events.type_id, 
additional_feed_events.data, 
additional_feed_events.created_at, 
additional_feed_events.updated_at, 
user_feed_events.user_id 
FROM feed_events AS additional_feed_events 
INNER JOIN user_feed_events 
ON additional_feed_events.id = user_feed_events.feed_event_id 
INNER JOIN friendships 
ON user_feed_events.user_id = friendships.friend_id 
WHERE friendships.friend_id = 6 
AND friendships.accepted_at IS NOT NULL) 

ORDER BY feed_events.created_at DESC"

但目前无法正常工作,我也不确定这样做是否正确!

谢谢, 奥古斯托

【问题讨论】:

  • 请格式化您的 SQL 语句,以便无需滚动即可阅读它们。
  • 好的,我格式化了。现在应该更好了:)
  • 你能显示表定义吗?你有索引吗?
  • 你能展示一下友谊表的样子吗?好友是如何匹配的——friend_id 和?
  • 我用友谊表定义的详细信息更新了我的问题。

标签: sql ruby-on-rails postgresql


【解决方案1】:

为什么要使用 IN 列表?为什么不从选定的用户开始?另外,我认为不需要您的左外连接:

SELECT 
DISTINCT feed_events.id, 
feed_events.event_type, 
feed_events.type_id, 
feed_events.data, 
feed_events.created_at, 
feed_events.updated_at, 
user_feed_events.user_id  
FROM 
(
  select friend_id from friendship where user_id = YOURUSER
  UNION
  select user_id as friend_id from friendship where friend_id = YOURUSER
) friendship
inner join user_feed_events 
on friendship.friend_id = user_feed_events.user_id
inner join feed_events
on user_feed_events.feed_event_id = feed_events.id
ORDER BY feed_events.created_at DESC

如果您想保留原始语句并对其进行优化,请使用此语句:

SELECT 
DISTINCT feed_events.id, 
feed_events.event_type, 
feed_events.type_id, 
feed_events.data, 
feed_events.created_at, 
feed_events.updated_at, 
user_feed_events.user_id  
FROM user_feed_events 
INNER JOIN feed_events 
ON feed_events.id = user_feed_events.feed_event_id 
WHERE user_feed_events.user_id IN (#{friends_id}) 
ORDER BY feed_events.created_at DESC

这删除了不必要的 LEFT JOIN。

此外,请确保您在用于外键的列上创建了索引。

【讨论】:

  • 谢谢丹尼尔,正如我在更新的问题上解释的那样,问题是友谊表是如何建立的。为了找到我所有的朋友,我需要同时查找 user_id 和friend_id 列!如何在连接中处理这个问题?
  • @Augusto:您可以使用第二条语句直接替换您的语句。我将更新第一条语句以反映您对友谊表的要求。
  • @Daniel,非常感谢。我现在正在使用您编辑的第一个提案,它似乎更优雅。尽管如此,处理查询仍需要 2 秒 :( 现在我检查以确保我已正确索引所有外键。
  • @Augusto:也请尝试第二个查询。
  • 我也试过了,它有效,但处理时间似乎与第一个相同:(
【解决方案2】:

好的,所以查询不是您的问题,您的数据库必须设置为不超过几微秒。首先,查询。它应该是这样的:

 SELECT feed_events.id, 
        feed_events.event_type, 
        feed_events.type_id, 
        feed_events.data, 
        feed_events.created_at, 
        feed_events.updated_at, 
        user_feed_events.user_id  

   FROM feed_events
            INNER JOIN
        user_feed_events ON feed_events.id = user_feed_events.feed_event_id
            INNER JOIN
        user_friends     ON user_friends.friend_id = user_feed_events.user_id

  WHERE user_friends.user_id = ** The Id of the User in Question **
  ORDER BY feed_events.created_at DESC

接下来,您需要确保您的 Id 列是主键,并且 user_friends 表中的 (friend_id, user_id) 等内容具有唯一索引。顺便说一句,我只是编了这些名字,我试着猜猜你在叫什么表来存储友谊。

【讨论】:

    【解决方案3】:
    select distinct fe.id, fe.event_type,
           fe.type_id, fe.data, fe.created_at,
           fe.updated_at, ufe.user_id
    from friendships as f
        inner join user_feed_events as ufe on f.friend_id = ufe.user_id
        inner join feed_events as fe on ufe.user_id = fe.id
    where f.user_id = 6 and f.accepted_at is not null
    order by fe.created_at desc
    

    不确定这里是否真的需要 distinct。查询返回指定用户的朋友的提要事件.. 我应该希望 ;)

    编辑。 该解决方案与 Daniel Hilgarth 提出的解决方案几乎相同。

    【讨论】:

      【解决方案4】:

      WHERE 子句中使用子SELECT 来构建IN() 调用的提要事件列表。像这样的东西(未经测试):

      SELECT fe.id, 
          fe.event_type, 
          fe.type_id, 
          fe.data, 
          fe.created_at, 
          fe.updated_at,
          ufe.user_id  
      FROM feed_events AS fe, user_feed_events AS ufe
      WHERE TRUE = TRUE
          AND fe.id = ufe.feed_event_id
          AND ufe.user_id = :user_id
          AND fe.id IN((
              SELECT ufe.feed_event_id
              FROM user_feed_events AS ufe, user_friends AS uf
              WHERE uf.friend_id = :user_id
          ))
      ORDER BY feed_events.created_at DESC;
      

      我很想看看EXPLAIN ANALYZE 的样子。

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2012-01-07
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2018-08-17
        相关资源
        最近更新 更多