【问题标题】:PostgreSQL query joining with view has poor performancePostgreSQL 查询加入视图性能较差
【发布时间】:2017-11-13 20:37:02
【问题描述】:

我正在尝试创建一个返回用户拥有的文件/文件夹列表的 API,类似于 Google Drive。有关用户文件/文件夹/其他的信息存储在 PostgreSQL 表中。

我创建了一个名为 gc_drive 的视图,它从这些表中获取文件列表。是的,我称它们为表是因为文件、文件夹和虚拟文件有不同的表,所以我使用视图将所有这些记录连接在一起,以便它们有效地显示为文件系统,即使它们不是 100% 的文件系统。

视图性能看起来不错,我已经为必要的列编制了索引,即使视图有超过 270 万条记录以及所有数据组合,它也会按 ID 过滤,并以毫秒为单位返回父级过滤的记录。但是,如果我尝试从 gc_drive 中选择 count(*),则需要 69 秒,因为包含文件、文件夹、资源和其他列表的基表每个都有超过 700 万条记录。

我想将 gc_drive 视图与 fn_resource_permissions 表连接起来,该表包含有关用户是否有权访问资源的信息。

当我使用以下查询获取当前登录用户的权限时,记录几乎立即返回:

select*from 
fn_resource_permissions
WHERE ((fn_resource_permissions.permission_id IN (23,
                                                  24,
                                                  25,
                                                  37,
                                                  36)
AND (fn_resource_permissions.user_id = 2
     OR fn_resource_permissions.user_id = 1)))

--59 records returned

上面的查询返回给我用户拥有的资源的所有权限。我将使用这些记录来过滤 gc_drive 视图。

当我尝试在以下查询中使用这 59 条记录来过滤掉 gc_drive 视图时,这些记录也会立即返回:

SELECT gc_drive.id,
       "fn_resources"."owner_id" AS "fn_resources.owner_id",
       "gc_drive".*
FROM "gc_drive"
JOIN "fn_resources" ON ("gc_drive"."resource_id" = "fn_resources"."id")
WHERE "fn_resources"."archived" = 0
AND   gc_drive.id IN(
    1234,
    1235,
    1236,
    ...
);

问题是当我尝试将上面的 2 个查询连接在一起时:

SELECT gc_drive.id,
       "fn_resources"."owner_id" AS "fn_resources.owner_id",
       "gc_drive".*
FROM "gc_drive"
JOIN "fn_resources" ON ("gc_drive"."resource_id" = "fn_resources"."id")
JOIN "fn_resource_permissions" ON ("fn_resource_permissions"."resource_id" = fn_resources.id)
WHERE "fn_resources"."archived" = 0
AND ((fn_resource_permissions.permission_id IN (23,
                                                24,
                                                25,
                                                37,
                                                36)
          AND (fn_resource_permissions.user_id = 2
               OR fn_resource_permissions.user_id = 1)))
LIMIT 101;

当我查看解释计划时,我可以看到 PostgreSQL 正在物化整个 gc_drive 视图,然后才尝试按资源权限进行过滤。这使得查询在几分钟内运行,而不是在毫秒内。我还尝试将资源权限放入 with 子句中,它也是如此。我知道其中一种解决方案可能是将每个用户的内容分隔在自己的模式中,但我想知道是否有更好更有效的方法来加入 gc_drive 视图和其他表,而无需 PostgreSQL 首先实现整个视图。有没有办法修改查询,以便 PostgreSQL 只能过滤整个记录集。

解释上述查询的分析计划:

    ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                                      QUERY PLAN                                                                                       │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Limit  (cost=4845.56..57547586.51 rows=101 width=722) (actual time=201371.236..201371.236 rows=0 loops=1)                                                                             │
│   ->  Hash Join  (cost=4845.56..681971785.33 rows=1197 width=722) (actual time=201371.234..201371.234 rows=0 loops=1)                                                                 │
│         Hash Cond: ("*SELECT* 1".resource_id = fn_resources.id)                                                                                                                       │
│         ->  Append  (cost=36.10..681748786.17 rows=15867640 width=247) (actual time=17521.979..200235.975 rows=2737742 loops=1)                                                       │
│               ->  Subquery Scan on "*SELECT* 1"  (cost=36.10..533636.75 rows=11252 width=150) (actual time=0.004..0.004 rows=0 loops=1)                                               │
│                     ->  Hash Join  (cost=36.10..533524.23 rows=11252 width=150) (actual time=0.003..0.003 rows=0 loops=1)                                                             │
│                           Hash Cond: (fnr_child.type = fnrt.type)                                                                                                                     │
│                           ->  Nested Loop  (cost=0.00..66613.04 rows=1940 width=122) (actual time=0.003..0.003 rows=0 loops=1)                                                        │
│                                 ->  Nested Loop  (cost=0.00..49607.76 rows=1940 width=100) (actual time=0.002..0.002 rows=0 loops=1)                                                  │
│                                       ->  Nested Loop  (cost=0.00..32602.48 rows=1940 width=104) (actual time=0.002..0.002 rows=0 loops=1)                                            │
│                                             ->  Nested Loop  (cost=0.00..16315.94 rows=1940 width=12) (actual time=0.002..0.002 rows=0 loops=1)                                       │
│                                                   ->  Seq Scan on gc_virtual_file gvf  (cost=0.00..29.40 rows=1940 width=8) (actual time=0.001..0.001 rows=0 loops=1)                 │
│                                                   ->  Index Scan using gc_file_resource_id_key on gc_file gf_child  (cost=0.00..8.38 rows=1 width=4) (never executed)                 │
│                                                         Index Cond: (resource_id = gvf.resource_id)                                                                                   │
│                                             ->  Index Scan using gc_file_resource_id_key on gc_file gf_parent  (cost=0.00..8.38 rows=1 width=92) (never executed)                     │
│                                                   Index Cond: (resource_id = gvf.parent_id)                                                                                           │
│                                       ->  Index Scan using fn_resources_pkey on fn_resources fnr_parent  (cost=0.00..8.75 rows=1 width=4) (never executed)                            │
│                                             Index Cond: (id = gvf.parent_id)                                                                                                          │
│                                 ->  Index Scan using fn_resources_pkey on fn_resources fnr_child  (cost=0.00..8.75 rows=1 width=30) (never executed)                                  │
│                                       Index Cond: (id = gvf.resource_id)                                                                                                              │
│                           ->  Hash  (cost=21.60..21.60 rows=1160 width=36) (never executed)                                                                                           │
│                                 ->  Seq Scan on fn_resource_types fnrt  (cost=0.00..21.60 rows=1160 width=36) (never executed)                                                        │
│                           SubPlan 6                                                                                                                                                   │
│                             ->  Index Scan using users_pkey on fn_users  (cost=0.00..8.28 rows=1 width=13) (never executed)                                                           │
│                                   Index Cond: (id = fnr_child.owner_id)                                                                                                               │
│                           SubPlan 7                                                                                                                                                   │
│                             ->  Index Scan using gc_tables_resource_id_key on gc_tables gt  (cost=0.00..8.28 rows=1 width=18) (never executed)                                        │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 8                                                                                                                                                   │
│                             ->  Index Scan using gc_webmapservices_resource_id_key on gc_webmapservices gwms  (cost=0.00..8.27 rows=1 width=418) (never executed)                     │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 9                                                                                                                                                   │
│                             ->  Index Scan using gc_tables_resource_id_key on gc_tables gt  (cost=0.00..8.31 rows=1 width=14) (never executed)                                        │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 10                                                                                                                                                  │
│                             ->  Index Scan using gc_webmapservices_resource_id_key on gc_webmapservices gwms  (cost=0.00..8.30 rows=1 width=267) (never executed)                     │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│               ->  Subquery Scan on "*SELECT* 2"  (cost=580092.62..681215149.42 rows=15856388 width=247) (actual time=17521.724..199482.039 rows=2737742 loops=1)                      │
│                     ->  Hash Join  (cost=580092.62..681056585.54 rows=15856388 width=247) (actual time=17521.668..198482.521 rows=2737742 loops=1)                                    │
│                           Hash Cond: (fnr_child.type = fnrt.type)                                                                                                                     │
│                           ->  Hash Join  (cost=580056.53..1189954.39 rows=2733860 width=219) (actual time=16768.431..88474.618 rows=2737742 loops=1)                                  │
│                                 Hash Cond: (gf_parent.resource_id = fnr_parent.id)                                                                                                    │
│                                 ->  Hash Join  (cost=388738.70..943959.36 rows=2733860 width=219) (actual time=13619.604..72625.523 rows=2737742 loops=1)                             │
│                                       Hash Cond: (gf_child.path = gf_parent.pathname)                                                                                                 │
│                                       ->  Hash Join  (cost=202378.85..538142.64 rows=2733860 width=168) (actual time=6429.263..53255.517 rows=2737743 loops=1)                        │
│                                             Hash Cond: (fnr_child.id = gf_child.resource_id)                                                                                          │
│                                             ->  Seq Scan on fn_resources fnr_child  (cost=0.00..114415.70 rows=6152170 width=30) (actual time=21.243..19364.289 rows=6119914 loops=1) │
│                                             ->  Hash  (cost=112139.60..112139.60 rows=2733860 width=142) (actual time=5666.626..5666.626 rows=2737743 loops=1)                        │
│                                                   Buckets: 262144  Batches: 2  Memory Usage: 240441kB                                                                                 │
│                                                   ->  Seq Scan on gc_file gf_child  (cost=0.00..112139.60 rows=2733860 width=142) (actual time=0.015..3442.788 rows=2737751 loops=1)  │
│                                       ->  Hash  (cost=112139.60..112139.60 rows=2733860 width=92) (actual time=7173.094..7173.094 rows=2737751 loops=1)                               │
│                                             Buckets: 262144  Batches: 2  Memory Usage: 169423kB                                                                                       │
│                                             ->  Seq Scan on gc_file gf_parent  (cost=0.00..112139.60 rows=2733860 width=92) (actual time=0.007..5057.412 rows=2737751 loops=1)        │
│                                 ->  Hash  (cost=114415.70..114415.70 rows=6152170 width=4) (actual time=3145.424..3145.424 rows=6119914 loops=1)                                      │
│                                       Buckets: 1048576  Batches: 1  Memory Usage: 215154kB                                                                                            │
│                                       ->  Seq Scan on fn_resources fnr_parent  (cost=0.00..114415.70 rows=6152170 width=4) (actual time=0.008..1186.819 rows=6119914 loops=1)         │
│                           ->  Hash  (cost=21.60..21.60 rows=1160 width=36) (actual time=0.021..0.021 rows=21 loops=1)                                                                 │
│                                 Buckets: 1024  Batches: 1  Memory Usage: 1kB                                                                                                          │
│                                 ->  Seq Scan on fn_resource_types fnrt  (cost=0.00..21.60 rows=1160 width=36) (actual time=0.005..0.012 rows=21 loops=1)                              │
│                           SubPlan 1                                                                                                                                                   │
│                             ->  Index Scan using users_pkey on fn_users  (cost=0.00..8.28 rows=1 width=13) (actual time=0.009..0.009 rows=1 loops=2737742)                            │
│                                   Index Cond: (id = fnr_child.owner_id)                                                                                                               │
│                           SubPlan 2                                                                                                                                                   │
│                             ->  Index Scan using gc_file_resource_id_key on gc_file gf  (cost=0.00..8.62 rows=1 width=47) (never executed)                                            │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 3                                                                                                                                                   │
│                             ->  Index Scan using gc_file_resource_id_key on gc_file gf  (cost=0.00..8.62 rows=1 width=47) (actual time=0.025..0.025 rows=1 loops=2665337)             │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 4                                                                                                                                                   │
│                             ->  Index Scan using gc_file_resource_id_key on gc_file gf  (cost=0.00..8.63 rows=1 width=129) (never executed)                                           │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│                           SubPlan 5                                                                                                                                                   │
│                             ->  Index Scan using gc_file_resource_id_key on gc_file gf  (cost=0.00..8.66 rows=1 width=145) (actual time=0.003..0.003 rows=1 loops=2665337)            │
│                                   Index Cond: (resource_id = fnr_child.id)                                                                                                            │
│         ->  Hash  (cost=4804.79..4804.79 rows=374 width=12) (actual time=412.434..412.434 rows=59 loops=1)                                                                            │
│               Buckets: 1024  Batches: 1  Memory Usage: 3kB                                                                                                                            │
│               ->  Nested Loop  (cost=13.32..4804.79 rows=374 width=12) (actual time=156.510..412.313 rows=59 loops=1)                                                                 │
│                     ->  Bitmap Heap Scan on fn_resource_permissions  (cost=13.32..1454.38 rows=374 width=4) (actual time=155.407..402.734 rows=59 loops=1)                            │
│                           Recheck Cond: ((user_id = 2) OR (user_id = 1))                                                                                                              │
│                           Filter: (permission_id = ANY ('{23,24,25,37,36}'::integer[]))                                                                                               │
│                           ->  BitmapOr  (cost=13.32..13.32 rows=391 width=0) (actual time=88.402..88.402 rows=0 loops=1)                                                              │
│                                 ->  Bitmap Index Scan on resource_permissions_user  (cost=0.00..6.57 rows=196 width=0) (actual time=88.380..88.380 rows=61 loops=1)                   │
│                                       Index Cond: (user_id = 2)                                                                                                                       │
│                                 ->  Bitmap Index Scan on resource_permissions_user  (cost=0.00..6.57 rows=196 width=0) (actual time=0.016..0.016 rows=0 loops=1)                      │
│                                       Index Cond: (user_id = 1)                                                                                                                       │
│                     ->  Index Scan using fn_resources_pkey on fn_resources  (cost=0.00..8.95 rows=1 width=8) (actual time=0.154..0.155 rows=1 loops=59)                               │
│                           Index Cond: (id = fn_resource_permissions.resource_id)                                                                                                      │
│                           Filter: (archived = 0)                                                                                                                                      │
│ Total runtime: 201373.236 ms                                                                                                                                                          │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

非常感谢。

【问题讨论】:

  • 可能按用户分区?
  • Edit您的问题并添加使用explain (analyze, verbose, buffers)生成的执行计划。 Formatted textno screen shots
  • 您已添加使用explain 生成的计划 - 请向我们展示使用explain (analyze, verbose, buffers) 生成的计划
  • @a_horse_with_no_name 我已经添加了查询计划。我认为在这种情况下最好也是唯一的方案可能是将表中的数据拆分到每个用户模式中。

标签: postgresql


【解决方案1】:

您的查询结构使得fn_resourcesfn_resource_permissions 过滤,然后加入gc_drive,尽管gc_drive 中存在相同的数据。

这有点预感,但请尝试在早期使用 fn_resource_permissions 过滤 gc_drive,使用 where 子句,而不是连接上的连接。

select
    gc_drive.id,
    fn_resources.owner_id,
    gc_drive.*
from
    gc_drive
    join fn_resources on(
        gc_drive.resource_id = fn_resources.id
    )
where
    gc_drive.resource_id in(
        select
            fn_resource_permissions.resource_id
        from
            fn_resource_permissions
        where
            fn_resource_permissions.permission_id in (23, 24, 25, 37, 36)
            and fn_resource_permissions.user_id in(1, 2)
    )
    and fn_resources.archived = 0
limit 101;

或者,尝试在加入之前过滤fn_resources

select
    gc_drive.id,
    fn_resources.owner_id,
    gc_drive.*
from
    gc_drive
    join fn_resources on(
        gc_drive.resource_id = fn_resources.id
        and fn_resources.id in(
            select
                fn_resource_permissions.resource_id
            from
                fn_resource_permissions
            where
                fn_resource_permissions.permission_id in (23, 24, 25, 37, 36)
                and fn_resource_permissions.user_id in(1, 2)
        )
    )
where
    fn_resources.archived = 0
limit 101;

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2013-02-05
    • 1970-01-01
    • 1970-01-01
    • 2016-10-19
    • 1970-01-01
    • 1970-01-01
    • 2020-05-16
    • 1970-01-01
    相关资源
    最近更新 更多