【问题标题】:Convert column value with html tags into sql view with rows and columns将带有html标签的列值转换为带有行和列的sql视图
【发布时间】:2020-07-17 06:57:05
【问题描述】:

我有一个名为 data 的表,其中包含 desc_data 列。 该列的值如下:

<span class ="label">A</span><br> <span class ="value">A-Class</span> <span class ="label">B</span><br> <span class ="value">B-Class</span>.

我想解析此列值,剥离 html 标记并使用 sql 查询(可能是 Regexp_Replace)将其拆分到一个新视图中,这样: 所有标签值都成为列,即

&lt;span class ="label"&gt; A & &lt;span class ="label"&gt;B 将成为,并且

&lt;span class ="value"&gt;A-Class & &lt;span class ="value"&gt;B-Class 将分别成为列值

实际数据更多,包含许多标签和值,但这只是获取帮助的示例。 预期的结果应该是:

查看数据_查看

A          B
A-Class    B-Class

【问题讨论】:

    标签: html sql oracle plsql


    【解决方案1】:

    我认为将所需数据作为行而不是列获取会更方便。 您可以使用 xmltable 解析它,只需对原始 html 稍作修改(删除像 &lt;br&gt; 这样的未封闭标签。这就是为什么 &lt;br/&gt; 更好):

    with t as (
      -- your sample data:
      select
        q'[<span class ="label">A</span><br> <span class ="value">A-Class</span> <span class ="label">B</span><br> <span class ="value">B-Class</span>.
      ]' html_data
    from dual
    )
    -- main query:
    select xt.*
    from t
        ,xmltable(
          'let $labels := /root/span[@class eq "label"]
           let $values := /root/span[@class eq "value"]
           for $label at $i in $labels
              return element label {
                 attribute name {$label/text()}, 
                 attribute value {$values[$i]/text()}
              }
           '
          passing
          xmltype(
           --- modify your html to make it compatible with xml:
           '<root>'
           || replace(replace(t.html_data,'<br>'),'&nbsp;')
           ||'</root>'
          )
          columns
             n for ordinality,
             label_name path '@name',
             label_value path '@value'
        ) xt;
    

    结果:

             N LABEL_NAME                     LABEL_VALUE
    ---------- ------------------------------ ------------------------------
             1 A                              A-Class
             2 B                              B-Class
    

    【讨论】:

    • 嗨,Sayan,当我检查更大的真实数据时。有时它会抛出错误。例如:当标签类似于 Domain
      Production 项目团队
      Asdsh jsdja kajdhjahdja Grueber,这里是标签域而不是值 Production,它需要下一个标签的值,即Asdsh jsdja kajdhjahdja Grueber 应该与项目团队对抗
    • 基本上是一种修改 let $values := /root/span[@class eq "desc_value"] 的方法,使其也包含 let $values := /root/span[@class eq "选择 desc_value"]
    【解决方案2】:

    您需要通过某种模式(例如'/span&gt; &lt;span')递归地拆分您的字符串。使用REGEXP_REPLACE() 函数提取所需的列,然后应用透视:

    WITH t(desc_data) AS
    (
     SELECT '<span class ="label">A</span><br> <span class ="value">A-Class</span> <span class ="label">B</span><br> <span class ="value">B-Class</span> <span class ="label">C</span><br> <span class ="value">C-Class</span>'
       FROM dual
    ), t2 AS
    (
    SELECT SUBSTR(desc_data,1,CASE WHEN INSTR(desc_data,'/span> <span',1,level) > 0
                                   THEN INSTR(desc_data,'/span> <span',1,level) + 5 
                                   ELSE LENGTH(desc_data)
                               END
                 ) AS desc_data2
      FROM t
     CONNECT BY level <= REGEXP_COUNT(desc_data,'/span> <span') + 1
    )
    SELECT *
      FROM
      (
       SELECT REGEXP_REPLACE(desc_data2,'(.*"label">)(\S+)(</span>.*)','\2') AS label,
              REGEXP_REPLACE(desc_data2,'(.*"value">)(\S+)(</span>.*)','\2') AS value
         FROM t2 )
     PIVOT ( MAX(VALUE) FOR LABEL IN ('A' AS "A", 'B' AS "B", 'C' AS "C") );
    
    
    A          B          C
    -------    -------    -------
    A-Class    B-Class    C-Class 
    

    Demo

    【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-09-25
    • 2020-02-01
    • 1970-01-01
    • 2016-08-01
    • 2016-01-09
    • 1970-01-01
    • 2015-10-26
    • 2012-12-10
    相关资源
    最近更新 更多