有各种更简单、更快捷的方法。
2x DISTINCT ON
SELECT *
FROM (
SELECT DISTINCT ON (name)
name, week AS first_week, value AS first_val
FROM tbl
ORDER BY name, week
) f
JOIN (
SELECT DISTINCT ON (name)
name, week AS last_week, value AS last_val
FROM tbl
ORDER BY name, week DESC
) l USING (name);
或更短:
SELECT *
FROM (SELECT DISTINCT ON (1) name, week AS first_week, value AS first_val FROM tbl ORDER BY 1,2) f
JOIN (SELECT DISTINCT ON (1) name, week AS last_week , value AS last_val FROM tbl ORDER BY 1,2 DESC) l USING (name);
简单易懂。在我的旧测试中也是最快的。 DISTINCT ON详解:
2x 窗口函数,1x DISTINCT ON
SELECT DISTINCT ON (name)
name, week AS first_week, value AS first_val
, first_value(week) OVER w AS last_week
, first_value(value) OVER w AS last_value
FROM tbl t
WINDOW w AS (PARTITION BY name ORDER BY week DESC)
ORDER BY name, week;
显式的WINDOW 子句只会缩短代码,对性能没有影响。
复合类型的first_value()
aggregate functions min() or max() 不接受复合类型作为输入。您必须创建自定义聚合函数(这并不难)。
但是window functions first_value() and last_value() 做。在此基础上,我们可以设计简单的解决方案:
简单查询
SELECT DISTINCT ON (name)
name, week AS first_week, value AS first_value
,(first_value((week, value)) OVER (PARTITION BY name ORDER BY week DESC))::text AS l
FROM tbl t
ORDER BY name, week;
输出包含所有数据,但上周的值被填充到匿名记录中(可选地转换为text)。您可能需要分解的值。
机会主义使用表类型的分解结果
为此,我们需要一个众所周知的复合类型。修改后的表定义将允许直接使用表类型本身:
CREATE TABLE tbl (week int, value int, name text); -- optimized column order
week 和 value 排在第一位,所以现在我们可以按表类型本身进行排序:
SELECT (l).name, first_week, first_val
, (l).week AS last_week, (l).value AS last_val
FROM (
SELECT DISTINCT ON (name)
week AS first_week, value AS first_val
, first_value(t) OVER (PARTITION BY name ORDER BY week DESC) AS l
FROM tbl t
ORDER BY name, week
) sub;
用户自定义行类型的分解结果
这在大多数情况下可能是不可能的。使用CREATE TYPE(永久)或CREATE TEMP TABLE(在会话期间)注册复合类型:
CREATE TEMP TABLE nv(last_week int, last_val int); -- register composite type
SELECT name, first_week, first_val, (l).last_week, (l).last_val
FROM (
SELECT DISTINCT ON (name)
name, week AS first_week, value AS first_val
, first_value((week, value)::nv) OVER (PARTITION BY name ORDER BY week DESC) AS l
FROM tbl t
ORDER BY name, week
) sub;
自定义聚合函数first() & last()
为每个数据库创建一次函数和聚合:
CREATE OR REPLACE FUNCTION public.first_agg (anyelement, anyelement)
RETURNS anyelement
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
'SELECT $1;'
CREATE AGGREGATE public.first(anyelement) (
SFUNC = public.first_agg
, STYPE = anyelement
, PARALLEL = safe
);
CREATE OR REPLACE FUNCTION public.last_agg (anyelement, anyelement)
RETURNS anyelement
LANGUAGE sql IMMUTABLE STRICT PARALLEL SAFE AS
'SELECT $2';
CREATE AGGREGATE public.last(anyelement) (
SFUNC = public.last_agg
, STYPE = anyelement
, PARALLEL = safe
);
然后:
SELECT name
, first(week) AS first_week, first(value) AS first_val
, last(week) AS last_week , last(value) AS last_val
FROM (SELECT * FROM tbl ORDER BY name, week) t
GROUP BY name;
可能是最优雅的解决方案。使用提供 C 实现的 additional module first_last_agg 更快。
比较instructions in the Postgres Wiki。
相关:
dbfiddle here(显示全部)
旧 sqlfiddle
在使用EXPLAIN ANALYZE 对具有 50k 行的表进行快速测试时,这些查询中的每一个都比当前接受的答案快得多。
还有更多方法。根据数据分布,不同的查询样式可能会(很多)更快。见: