您需要按照here 的描述在分隔符上应用两次拆分。
最后使用 LISTAGG 再次将值(单词)变平,并通过一些字符串连接完成。
我提供了一个完整的示例有两个输入记录,因此它可以扩展到任何数量的解析行。
您可能需要调整 T2table 以限制拆分次数。如果您的关键字中可以包含 NULL 值,则还需要进行一些特殊处理。
查询 - 在下面评论
WITH t1 AS
(SELECT 1 id,
'12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!|| ' col
FROM dual
UNION ALL
SELECT 2 id,
'22222ACCCC12Y||!!567!!word21 !!word22!! word23!!||!!789!!word24!!word25 !! word26!!||!!2345 !!word27!!word28!! 890!!|| ' col
FROM dual
),
t2 AS
(SELECT rownum colnum
FROM dual
CONNECT BY level < 10
/* (max) number of columns */
),
t3 AS
(SELECT t1.id,
t2.colnum,
regexp_substr(t1.col,'[^|]+', 1, t2.colnum) col
FROM t1,
t2
WHERE regexp_substr(t1.col, '[^|]+', 1, t2.colnum) IS NOT NULL
),
first_split AS
( SELECT id, colnum, col FROM t3 WHERE col LIKE '%!!%'
),
second_split AS
(SELECT t1.id,
t1.colnum linenum,
t2.colnum,
regexp_substr(t1.col,'[^!]+', 1, t2.colnum) col
FROM first_split t1,
t2
WHERE regexp_substr(t1.col, '[^!]+', 1, t2.colnum) IS NOT NULL
),
agg_values AS
(SELECT id,
linenum,
LISTAGG(col, ',') WITHIN GROUP (
ORDER BY colnum) val_lst
FROM second_split
GROUP BY id,
linenum
)
SELECT id,
'array['
|| row_number() over (partition BY ID order by linenum)
|| ']= ('
||val_lst
||')' array_text
FROM agg_values
ORDER BY 1,2
按要求输出
ID ARRAY_TEXT
1 array[1]= (123, word1, word2, word3)
1 array[2]= (789, word4, word5, word6)
1 array[3]= (2345, word7, word8, 890)
2 array[1]= (567, word21, word22, word23)
2 array[2]= (789, word24, word25, word26)
2 array[3]= (2345, word27, word28, 890)
这是 first_split 查询的结果。您将数据分成几行。
ID COLNUM COL
---------- ---------- ------------------------------------------
1 2 !!123!!word1 !!word2!! word3!!
1 3 !!789!!word4!!word5 !! word6!!
1 4 !!2345 !!word7!!word8!! 890!!
2 2 !!567!!word21 !!word22!! word23!!
2 3 !!789!!word24!!word25 !! word26!!
2 4 !!2345 !!word27!!word28!! 890!!
second_split 查询在 word 中换行。
ID LINENUM COLNUM COL
---------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------
1 2 1 123
1 2 2 word1
1 2 3 word2
1 2 4 word3
1 3 1 789
1 3 2 word4
1 3 3 word5
.....
剩下的就是 LISTAGG 来获取 csv 关键字列表和 ROW_NUMBER 函数来获取不错的顺序 array_ids
如果您想提取不同列中的值,请使用 PIVOT 而不是 LISTAGG。缺点是您必须根据值的实际数量调整查询。
WITH t1 AS
(SELECT 1 id,
'12322ABCD124A||!!123!!word1 !!word2!! word3!!||!!789!!word4!!word5 !! word6!!||!!2345 !!word7!!word8!! 890!!|| ' col
FROM dual
UNION ALL
SELECT 2 id,
'22222ACCCC12Y||!!567!!word21 !!word22!! word23!!||!!789!!word24!!word25 !! word26!!||!!2345 !!word27!!word28!! 890!!|| ' col
FROM dual
),
t2 AS
(SELECT rownum colnum
FROM dual
CONNECT BY level < 10
/* (max) number of columns */
),
t3 AS
(SELECT t1.id,
t2.colnum,
regexp_substr(t1.col,'[^|]+', 1, t2.colnum) col
FROM t1,
t2
WHERE regexp_substr(t1.col, '[^|]+', 1, t2.colnum) IS NOT NULL
),
first_split AS
( SELECT id, colnum, col FROM t3 WHERE col LIKE '%!!%'
),
--select * from first_split order by 1,2,3;
second_split AS
(SELECT t1.id,
t1.colnum linenum,
t2.colnum,
regexp_substr(t1.col,'[^!]+', 1, t2.colnum) col
FROM first_split t1,
t2
WHERE regexp_substr(t1.col, '[^!]+', 1, t2.colnum) IS NOT NULL
),
pivot_values AS
(SELECT *
FROM second_split PIVOT (MAX(col) col FOR (colnum) IN (1 AS "K1", 2 AS "K2", 3 AS "K3", 4 AS "K4"))
)
SELECT id,
row_number() over (partition BY ID order by linenum) AS array_id,
K1_COL,
K2_COL,
K3_COL,
K4_COL
FROM pivot_values
ORDER BY 1,2;
提供关系视图
ID ARRAY_ID K1_COL K2_COL K3_COL K4_COL
---------- ---------- -------- -------- -------- --------
1 1 123 word1 word2 word3
1 2 789 word4 word5 word6
1 3 2345 word7 word8 890
2 1 567 word21 word22 word23
2 2 789 word24 word25 word26
2 3 2345 word27 word28 890