【问题标题】:Parsing string with multiple delimiters into columns将具有多个分隔符的字符串解析为列
【发布时间】:2022-01-24 19:53:50
【问题描述】:

我想将字符串拆分成列。

我的专栏应该是:

account_id, resource_type, resource_name

我有一个 JSON 文件源,我一直在尝试通过 ADF 数据流进行解析。这对我不起作用,因此我将数据展平并将其带入 SQL Server(如果有人可以告诉我如何通过 ADF 或 SQL 解析值,我愿意)。请检查底部的 JSON 文件。

使用此代码查询我正在处理的数据。

 CREATE TABLE test.test2
 (
     resource_type nvarchar(max) NULL
 )

 INSERT INTO test.test2 ([resource_type]) 
 VALUES 
     ('account_id:224526257458,resource_type:buckets,resource_name:camp-stage-artifactory'),
     ('account_id:535533456241,resource_type:buckets,resource_name:tni-prod-diva-backups'),
     ('account_id:369798452057,resource_type:buckets,resource_name:369798452057-s3-manifests'),
     ('account_id:460085747812,resource_type:buckets,resource_name:vessel-incident-report-nonprod-accesslogs')

我应该能够在 SQL Server 中查询的输出应该是这样的:

account_id resource_type resource_name
224526257458 buckets camp-stage-artifactory
535533456241 buckets tni-prod-diva-backups

等等。

如果需要,请帮助我并要求澄清。提前致谢。

编辑:

源 JSON 格式:

{
    "start_date": "2021-12-01 00:00:00+00:00",
    "end_date": "2021-12-31 23:59:59+00:00",
    "resource_type": "all",
    "records": [
        {
            "directconnect_connections": [
                "account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fh40evn5'",
                "account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-ffxgf6kh'",
                "account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-fg5j5v6o'",
                "account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fgvfo1ej'"
            ]
        },
        {
            "virtual_interfaces": [
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgvj25vt'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgbw5gs0'",
                "account_id:401311080156,resource_type:virtual_interfaces,resource_name:'dxvif-ffnosohr'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fg18bdhl'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffmf6h64'",
                "account_id:390251991779,resource_type:virtual_interfaces,resource_name:'dxvif-fgkxjhcj'",
                "account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffp6kl3f'"
            ]
        }
    ]
}

【问题讨论】:

  • 我猜,列名需要动态生成..?
  • 我不确定我是否理解。在上面的查询中,列名用“:”分隔。
  • 列名必须在 SQL 中明确定义,@newbie。您的数据总是只有这 3 列吗?
  • 根据问题指南,请展示您的尝试并告诉我们您发现了什么(在本网站或其他地方)以及为什么它不能满足您的需求。
  • 是的,数据总是3列。这不太可能改变。我从另一个部门的 JSON 文件中获取这样的数据,并且无法控制其格式。仅供参考

标签: sql-server parsing azure-data-factory-2 delimited


【解决方案1】:

由于您没有有效的 JSON 字符串并且不想从事字符串操作的业务......也许这会有所帮助。

Select B.*
 From  test2 A
 Cross Apply ( Select account_id    = max(case when value like 'account_id:%'    then stuff(value,1,11,'') end )
                     ,resource_type = max(case when value like 'resource_type:%' then stuff(value,1,14,'') end )
                     ,resource_name = max(case when value like 'resource_name:%' then stuff(value,1,14,'') end )
                from  string_split(resource_type,',') 
             )B

结果

account_id      resource_type   resource_name
224526257458    buckets         camp-stage-artifactory
535533456241    buckets         tni-prod-diva-backups
369798452057    buckets         369798452057-s3-manifests
460085747812    buckets         vessel-incident-report-nonprod-accesslogs

【讨论】:

    【解决方案2】:

    很遗憾,数组中的值不是有效的 JSON。您可以通过在开头/结尾添加{} 并在:, 的任一侧添加" 来修补它们。

    DECLARE @json nvarchar(max) = N'{
        "start_date": "2021-12-01 00:00:00+00:00",
        "end_date": "2021-12-31 23:59:59+00:00",
        "resource_type": "all",
        "records": [
            {
                "directconnect_connections": [
                    "account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fh40evn5''",
                    "account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-ffxgf6kh''",
                    "account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-fg5j5v6o''",
                    "account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fgvfo1ej''"
                ]
            },
            {
                "virtual_interfaces": [
                    "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgvj25vt''",
                    "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgbw5gs0''",
                    "account_id:401311080156,resource_type:virtual_interfaces,resource_name:''dxvif-ffnosohr''",
                    "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fg18bdhl''",
                    "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffmf6h64''",
                    "account_id:390251991779,resource_type:virtual_interfaces,resource_name:''dxvif-fgkxjhcj''",
                    "account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffp6kl3f''"
                ]
            }
        ]
    }';
    
    
    SELECT
      j4.account_id,
      j4.resource_type,
      TRIM('''' FROM j4.resource_name) resource_name
    FROM OPENJSON(@json, '$.records') j1
    CROSS APPLY OPENJSON(j1.value) j2
    CROSS APPLY OPENJSON(j2.value) j3
    CROSS APPLY OPENJSON('{"' + REPLACE(REPLACE(j3.value, ':', '":"'), ',', '","') + '"}')
      WITH (
        account_id bigint,
        resource_type varchar(20),
        resource_name varchar(100)
      ) j4;
    

    db<>fiddle

    OPENJSON 的前三个调用没有架构,因此结果集是三列:keyvaluetype。对于数组(j1j3),key 是数组的索引。对于单个对象(j2),key 是每个属性名称。

    【讨论】:

    • 感谢您抽出宝贵的时间来回答。我会看看。 :)
    猜你喜欢
    • 2012-11-25
    • 2018-09-22
    • 2016-04-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多