【发布时间】:2018-01-16 16:40:37
【问题描述】:
在我的 CENSUS 表中,我想按州分组,并为每个州获取县人口中位数和县数。
在 psql、redshift 和雪花中,我可以这样做:
psql=> SELECT state, count(county), PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY "population2000") AS median FROM CENSUS GROUP BY state;
state | count | median
----------------------+-------+----------
Alabama | 67 | 36583
Alaska | 24 | 7296.5
Arizona | 15 | 116320
Arkansas | 75 | 20229
...
我正在尝试在标准 BigQuery 中找到一种很好的方法来执行此操作。我注意到有未记录的percentile_cont 分析功能可用,但我必须做一些重大的黑客攻击才能让它做我想做的事。
我希望能够用我收集到的正确论点来做同样的事情:
SELECT
state,
COUNT(county),
PERCENTILE_CONT(population2000,
0.5) OVER () AS `medPop`
FROM
CENSUS
GROUP BY
state;
但是这个查询会产生错误
SELECT list expression references column population2000 which is neither grouped nor aggregated at
我可以得到我想要的答案,但如果这是推荐的方法来做我想做的事,我会非常失望:
SELECT
MAX(nCounties) AS nCounties,
state,
MAX(medPop) AS medPop
FROM (
SELECT
nCounties,
T1.state,
(PERCENTILE_CONT(population2000,
0.5) OVER (PARTITION BY T1.state)) AS `medPop`
FROM
census T1
LEFT OUTER JOIN (
SELECT
COUNT(county) AS `nCounties`,
state
FROM
census
GROUP BY
state) T2
ON
T1.state = T2.state) T3
GROUP BY
state
有没有更好的方法来做我想做的事?另外,PERCENTILE_CONT 函数是否会被记录下来?
感谢阅读!
【问题讨论】:
标签: google-bigquery