在 SQL (Redshift) 中提取字符串并将列转换为行答案

【问题标题】：Extracting string and converting columns to rows in SQL (Redshift)在 SQL (Redshift) 中提取字符串并将列转换为行
【发布时间】：2018-11-16 22:15:58
【问题描述】：

我在名为“食物”的表中有一个名为“描述”的列，其中包含多个用分隔的食物名称，例如鸡、汤、面包、可乐

如何从列中提取每个项目并创建多行。例如目前它就像 {FoodID, FoodName, Description} ==> {123，膳食，“鸡肉、汤、面包、可乐”}

而我需要的是

{FoodID, FoodName, Description} ==> {123，餐，鸡}， {123，膳食，汤}， {123、膳食、面包}等。

在 Redshift 中，我首先将“描述”列拆分为

select FoodID, FoodName, Description, 
SPLIT_PART(Description, ',',1) AS Item1, 
SPLIT_PART(Description, ',',1) AS Item2,
SPLIT_PART(Description, ',',1) AS Item3,.....till Item10
FROM Food

考虑最多 10 个项目可以在那里，因此 Item10。将这些列 Item1 转换为 Item10 以存储为行的最佳方法是什么？我尝试了 UNION ALL，但考虑到大量数据需要更长的时间。

【问题讨论】：

SPLIT_PART 与联合似乎是一个可行的选择。真的，您甚至都不应该将此类未规范化的数据导入数据库。您可能不得不咬紧牙关，直到您可以正常化它。

标签： sql split multiple-columns amazon-redshift rows

【解决方案1】：

您的问题是 answered 这里专门针对 Redshift。您只需要将您的查询映射到那里提供的示例查询。您的查询将如下所示。

select  (row_number() over (order by true))::int as n into numbers from food limit 100;

这将创建数字表。

您的查询将变为：

select  foodId,   foodName,   split_part(Description,',',n) as descriptions from  food  cross join  numbers where  split_part(Description,',',n) is not null  and split_part(Description,',',n) != '';

现在，回到您最初关于性能的问题。

考虑到大量数据，这需要更长的时间。

考虑到high read 和seldom write 的典型数据仓库用例，您应该保留在stagging 表中提到的典型食品数据，例如stg_food。

您应该使用以下类型的查询来一次性插入实际的food 表，如下所示。

insert into  food  select  foodId,  foodName,  split_part(Description,',',n) as descriptions from  stg_food  cross join  numbers where   split_part(Description,',',n) is not null  and split_part(Description,',',n) != '';

这将写入一次，使您的选择查询更快。

【讨论】：

谢谢红孩子。我能够使用交叉连接逻辑来获得我的解决方案。我不需要使用行号，因为最多只有 9 个逗号（10 个项目）。所以我使用了一个数字迭代器和交叉连接。