将逗号分隔的邮政编码字段转换为城市列表答案

【问题标题】：Turn a field of comma-separated zip codes into a list of cities将逗号分隔的邮政编码字段转换为城市列表
【发布时间】：2016-10-09 22:25:38
【问题描述】：

我有两张桌子：

人员，列 user_id（唯一）和邮政编码（逗号分隔值；单个值中可能有多个邮政编码）；
location，包含 zipcode、city 和 state 列（每个 zip 都与一个城市和州相关联，并且该表包括整个美国）。

我正在尝试创建一个按 user_id 密度对城市进行排名的表。

所以，首先，我想获得一个表格，显示 user_id 旁边的城市，每个 user_id 在“人”表中与之相关联。一个 user_id 可以与多个城市相关联。

然后，我打算只计算每个城市的唯一 user_id，并根据 user_id 从最密集到最不密集对城市进行排名。

【问题讨论】：

标签： sql postgresql psql

【解决方案1】：

将逗号分隔的列值转换为单独的行

使用unnest() 将逗号分隔的列转换为单独的行，首先使用string_to_array() 从字符串中构建数组：

select 
  user_id, 
  unnest(string_to_array(zipcodes, ',')) AS zipcode
from people

生成测试数据：

create table people(user_id int, zipcodes text);
insert into people values (1, '22333,12354,45398,12398');
into people values (2, '54389,45398,12398');
insert into people values (3, '34534,12398,94385');

结果：

 user_id | zipcode
---------+---------
       1 | 22333
       1 | 12354
       1 | 45398
       1 | 12398
       2 | 54389
       2 | 45398
       2 | 12398
       3 | 34534
       3 | 12398
       3 | 94385

按用户密度对城市进行排名

使用LEFT JOIN 将城市信息与相关邮政编码的提取信息与用户相结合。 COUNT()您的用户并使用窗口功能DENSE_RANK()分配排名位置。在这种情况下，平局的位置相同。

查询：

SELECT
    l.city
  , COUNT(DISTINCT p.user_id) AS distinct_users -- is distinct really needed?
  , DENSE_RANK() OVER (ORDER BY COUNT(DISTINCT p.user_id) DESC) AS city_ranking
FROM location l
LEFT JOIN (
  select 
    user_id, 
    unnest(string_to_array(zipcodes, ',')) AS zipcode
  from people
  ) p USING ( zipcode )
GROUP BY l.city
ORDER BY city_ranking

生成测试数据：

create table location(zipcode text, city text);
insert into location values 
  ('22333', 'City1'), 
  ('12354', 'City2'), 
  ('45398', 'City3'), 
  ('12398', 'City4'), 
  ('54389', 'City5'), 
  ('34534', 'City6'), 
  ('94385', 'City7');

结果：

 city  | distinct_users | city_ranking
-------+----------------+--------------
 City4 |              3 |            1
 City3 |              2 |            2
 City2 |              1 |            3
 City1 |              1 |            3
 City5 |              1 |            3
 City6 |              1 |            3
 City7 |              1 |            3

补充说明

考虑您是否真的需要计算不同用户的邮政编码。用户是否有可能多次使用相同的邮政编码？

如果是这种情况，您可以在第一个查询中使用DISTINCT，这样您就不需要在排名查询中这样做了：

select distinct
  user_id,
  unnest(string_to_array(zipcodes, ',')) AS zipcode
from people;

删除排名查询中的不同部分，您就可以开始了。

【讨论】：