使用 Teradata 正则表达式标记键和值答案

【问题标题】：Tag key & value using Teradata Regular Expression使用 Teradata 正则表达式标记键和值
【发布时间】：2020-04-07 21:58:32
【问题描述】：

我有一个类似于以下的 TERADATA 数据集：

'项目：Hercules 问题类型：改进组件：核心 AffectsVersions：2.4.1 优先级：次要时间：15:25:23 04/06/2020'

我想根据key从上面提取标签值。

例如：

with comm as 
(
select  'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor' as text
)
select regexp_substr(comm.text,'[^: ]+',1,4)
 from comm where regexp_substr(comm.text,'[^: ]+',1,3) = 'IssueType';

有没有一种方法可以查询而无需更改每个标签的位置参数。我还发现最后一个字段的日期和时间字段有点棘手。

感谢任何帮助。

谢谢。

【问题讨论】：

要返回的确切结果是什么？
感谢您的提问。如果上面的内容可以用正则表达式分解成下面的键和值，而不必提供位置参数，那将有所帮助。 ``` Key Value ============== ==================== Project Hercules IssueType Improvement Components core AffectsVersions 2.4.1 Priority Minor时间 15:25:23 04/06/2020 ``

标签： teradata regexp-substr

【解决方案1】：

有NVP 函数可以访问名称/值对数据，但要拆分为多行，您需要strtok_split_to_table 或regexp_split_to_table。在您的情况下，棘手的部分是分隔符，如果它们是唯一的而不是 ' ' 和 ':' 会更容易：

WITH comm AS 
 (
   SELECT 1 as keycol, -- should be a key column in your table, either numeric or varchar
      'Project: Hercules IssueType: Improvement Components: core AffectsVersions: 2.4.1 Priority: Minor Time: 15:25:23 04/06/2020' AS text
 )
SELECT id, tokennum, token, 
   -- get the key
   StrTok(token,':', 1) AS "Key",
   -- get the value (can't use StrTok because of ':' delimiter)
   Substring(token From Position(': ' IN token)+2) AS "Value"
FROM TABLE
 ( RegExp_Split_To_Table(comm.keycol
                         ,comm.text
                         ,'( )(?=[^ ]+: )' -- assuming names don't contain spaces: split at the last space before ': '
                         , 'c') 
RETURNS (id INT , tokennum INTEGER, token VARCHAR(1000) CHARACTER SET Latin)) AS dt

【讨论】：