【发布时间】:2016-07-01 13:14:37
【问题描述】:
我在 Postgres 中有一个 updates 表,它是 9.4.5,如下所示:
goal_id | created_at | status
1 | 2016-01-01 | green
1 | 2016-01-02 | red
2 | 2016-01-02 | amber
还有一个像这样的goals 表:
id | company_id
1 | 1
2 | 2
我想为每家公司创建一个图表,显示他们每周所有目标的状态。
我认为这需要生成过去 8 周的一系列数据,找到该周之前每个目标的最新更新,然后计算找到的更新的不同状态。
到目前为止我所拥有的:
SELECT EXTRACT(year from generate_series) AS year,
EXTRACT(week from generate_series) AS week,
u.company_id,
COUNT(*) FILTER (WHERE u.status = 'green') AS green_count,
COUNT(*) FILTER (WHERE u.status = 'amber') AS amber_count,
COUNT(*) FILTER (WHERE u.status = 'red') AS red_count
FROM generate_series(NOW() - INTERVAL '2 MONTHS', NOW(), '1 week')
LEFT OUTER JOIN (
SELECT DISTINCT ON(year, week)
goals.company_id,
updates.status,
EXTRACT(week from updates.created_at) week,
EXTRACT(year from updates.created_at) AS year,
updates.created_at
FROM updates
JOIN goals ON goals.id = updates.goal_id
ORDER BY year, week, updates.created_at DESC
) u ON u.week = week AND u.year = year
GROUP BY 1,2,3
但这有两个问题。 u 上的连接似乎没有像我想象的那样工作。它似乎加入了从内部查询返回的每一行(?),并且这只选择了那一周发生的最新更新。如果需要,它应该获取该周之前的最新更新。
这是一些相当复杂的 SQL,我喜欢一些关于如何完成它的意见。
表结构和信息
目标表有大约 1000 个目标 ATM,并且每周增长大约 100 个:
Table "goals"
Column | Type | Modifiers
-----------------+-----------------------------+-----------------------------------------------------------
id | integer | not null default nextval('goals_id_seq'::regclass)
company_id | integer | not null
name | text | not null
created_at | timestamp without time zone | not null default timezone('utc'::text, now())
updated_at | timestamp without time zone | not null default timezone('utc'::text, now())
Indexes:
"goals_pkey" PRIMARY KEY, btree (id)
"entity_goals_company_id_fkey" btree (company_id)
Foreign-key constraints:
"goals_company_id_fkey" FOREIGN KEY (company_id) REFERENCES companies(id) ON DELETE RESTRICT
updates 表有大约 1000 个,并且每周增长大约 100 个:
Table "updates"
Column | Type | Modifiers
------------+-----------------------------+------------------------------------------------------------------
id | integer | not null default nextval('updates_id_seq'::regclass)
status | entity.goalstatus | not null
goal_id | integer | not null
created_at | timestamp without time zone | not null default timezone('utc'::text, now())
updated_at | timestamp without time zone | not null default timezone('utc'::text, now())
Indexes:
"goal_updates_pkey" PRIMARY KEY, btree (id)
"entity_goal_updates_goal_id_fkey" btree (goal_id)
Foreign-key constraints:
"updates_goal_id_fkey" FOREIGN KEY (goal_id) REFERENCES goals(id) ON DELETE CASCADE
Schema | Name | Internal name | Size | Elements | Access privileges | Description
--------+-------------------+---------------+------+----------+-------------------+-------------
entity | entity.goalstatus | goalstatus | 4 | green +| |
| | | | amber +| |
| | | | red | |
【问题讨论】:
-
我怀疑你想要一个window function - 你可以按你的时间片分区
-
@Codeman 嗯,看起来你是对的。我从来没有使用过窗口函数。你碰巧知道有什么好的资源可以看吗?谢谢!
-
可能是我联系你的那个:)
-
如果您将示例数据扩展到几十行并根据该示例数据添加预期结果,将会有所帮助。这将有助于理解所需的逻辑并验证解决方案的正确性。如果您的真实数据集很重要(100K+ 行),告诉我们每个表有多少行不会有什么坏处。解决方案的效率取决于数据分布是很常见的。
-
您应该提供显示数据类型和约束的实际表定义。并且始终是您的 Postgres 版本。
标签: sql postgresql greatest-n-per-group