将逗号分隔的列值转换为单独的行
使用unnest() 将逗号分隔的列转换为单独的行,首先使用string_to_array() 从字符串中构建数组:
select
user_id,
unnest(string_to_array(zipcodes, ',')) AS zipcode
from people
生成测试数据:
create table people(user_id int, zipcodes text);
insert into people values (1, '22333,12354,45398,12398');
into people values (2, '54389,45398,12398');
insert into people values (3, '34534,12398,94385');
结果:
user_id | zipcode
---------+---------
1 | 22333
1 | 12354
1 | 45398
1 | 12398
2 | 54389
2 | 45398
2 | 12398
3 | 34534
3 | 12398
3 | 94385
按用户密度对城市进行排名
使用LEFT JOIN 将城市信息与相关邮政编码的提取信息与用户相结合。 COUNT()您的用户并使用窗口功能DENSE_RANK()分配排名位置。在这种情况下,平局的位置相同。
查询:
SELECT
l.city
, COUNT(DISTINCT p.user_id) AS distinct_users -- is distinct really needed?
, DENSE_RANK() OVER (ORDER BY COUNT(DISTINCT p.user_id) DESC) AS city_ranking
FROM location l
LEFT JOIN (
select
user_id,
unnest(string_to_array(zipcodes, ',')) AS zipcode
from people
) p USING ( zipcode )
GROUP BY l.city
ORDER BY city_ranking
生成测试数据:
create table location(zipcode text, city text);
insert into location values
('22333', 'City1'),
('12354', 'City2'),
('45398', 'City3'),
('12398', 'City4'),
('54389', 'City5'),
('34534', 'City6'),
('94385', 'City7');
结果:
city | distinct_users | city_ranking
-------+----------------+--------------
City4 | 3 | 1
City3 | 2 | 2
City2 | 1 | 3
City1 | 1 | 3
City5 | 1 | 3
City6 | 1 | 3
City7 | 1 | 3
补充说明
考虑您是否真的需要计算不同用户的邮政编码。用户是否有可能多次使用相同的邮政编码?
如果是这种情况,您可以在第一个查询中使用DISTINCT,这样您就不需要在排名查询中这样做了:
select distinct
user_id,
unnest(string_to_array(zipcodes, ',')) AS zipcode
from people;
删除排名查询中的不同部分,您就可以开始了。