【问题标题】:Find lowest and highest values split into rows from a single string of concatenated values查找从单个串联值字符串中拆分为行的最低值和最高值
【发布时间】:2017-12-25 21:23:39
【问题描述】:

这是我的问题here 的后续行动:uzi 提供的该问题的答案很好。然而,我注意到一个新公司Company3 也使用了单个数据点,例如帐户 6000,它不遵循以前公司的方式,这使得 uzi 的递归 cte 不适用。

因此,我觉得有必要更改问题,但我相信由于解决方案的巨大影响,这种复杂性会产生一个新问题,而不是对我之前的问题进行编辑。

我需要从 Excel 工作簿中读取数据,其中数据以这种方式存储:

Company       Accounts
Company1      (#3000...#3999)
Company2      (#4000..#4019)+(#4021..#4024)
Company3      (#5000..#5001)+#6000+(#6005..#6010)

我认为由于某些公司,例如Company3 具有#6000 等帐户的单一值,因此我需要在此步骤中创建以下外观的结果集:

Company       FirstAcc LastAcc
Company1      3000     3999
Company2      4000     4019
Company2      4021     4024
Company3      5000     5001
Company3      6000     NULL
Company3      6005     6010

然后,我将使用此表并将其与仅包含整数的表连接,以获得最终表的外观,例如我的链接问题中的表。

有人有什么想法吗?

【问题讨论】:

  • 问题解决了吗?

标签: sql-server excel tsql ssis etl


【解决方案1】:

一个好的 t-sql 拆分器函数使这变得非常简单;我建议delimitedSplit8k。这也将比递归 CTE 执行得更好。首先是样本数据:

-- your sample data
if object_id('tempdb..#yourtable') is not null drop table #yourtable;
create table #yourtable (company varchar(100), accounts varchar(8000));
insert #yourtable values ('Company1','(#3000...#3999)'),
('Company2','(#4000..#4019)+(#4021..#4024)'),('Company3','(#5000..#5001)+#6000+(#6005..#6010)');

以及解决方案:

select 
  company, 
  firstAcc = max(case when split2.item not like '%)' then clean.Item end),
  lastAcc  = max(case when split2.item     like '%)' then clean.Item end)
from #yourtable t
cross apply dbo.delimitedSplit8K(accounts, '+') split1
cross apply dbo.delimitedSplit8K(split1.Item, '.') split2
cross apply (values (replace(replace(split2.Item,')',''),'(',''))) clean(item)
where split2.item > ''
group by split1.Item, company;

结果:

company   firstAcc   lastAcc
--------- ---------- --------------
Company1  #3000      #3999
Company2  #4000      #4019
Company2  #4021      #4024
Company3  #6000      NULL
Company3  #5000      #5001
Company3  #6005      #6010

【讨论】:

    【解决方案2】:

    我相信该列表 (#6005..#6010) 在您的 Excel 文件中表示为 #6005#6006#6007#6008#6009#6010。如果这是真的并且没有间隙,请尝试此查询

    with cte as (
    select 
        company, replace(replace(replace(accounts,'(',''),')',''),'+','')+'#' accounts
    from 
        (values ('company 1','#3000#3001#3002#3003'),('company 2','(#4000#4001)+(#4021#4022)'),('company 3','(#5000#5001)+#6000+(#6005#6006)')) data(company, accounts)
    )
    
    , rcte as (
        select 
            company, stuff(accounts, ind1, ind2 - ind1, '') acc, substring(accounts, ind1 + 1, ind2 - ind1 - 1) accounts
        from 
            cte
            cross apply (select charindex('#', accounts) ind1) ca
            cross apply (select charindex('#', accounts, ind1 + 1) ind2) cb
        union all
        select
            company, stuff(acc, ind1, ind2 - ind1, ''), substring(acc, ind1 + 1, ind2 - ind1 - 1)
        from
            rcte
            cross apply (select charindex('#', acc) ind1) ca
            cross apply (select charindex('#', acc, ind1 + 1) ind2) cb
        where
            len(acc)>1
    )
    
    select
        company, min(accounts) FirstAcc, case when max(accounts)  =min(accounts) then null else max(accounts) end LastAcc
    from (
        select
            company, accounts, accounts - row_number() over (partition by company order by accounts) group_
        from 
            rcte
        ) t
    group by company, group_
    
    option (maxrecursion 0)
    

    【讨论】:

      【解决方案3】:

      我对另一个问题的@uzi 解决方案进行了一些编辑,其中我添加了其他三个 CTE,并使用了 LEAD()ROW_NUMBER() 等 Windows 功能来解决问题。我不知道是否有更简单的解决方案,但我认为这很好用。

      with cte as (
      select 
          company, replace(replace(replace(accounts,'(',''),')',''),'+','')+'#' accounts 
      from 
          (values ('company 1','#3000..#3999'),('company 2','(#4000..#4019)+(#4021..#4024)'),('company 3','(#5000..#5001)+#6000+(#6005..#6010)')) data(company, accounts)
      )
      , rcte as (
          select 
              company, stuff(accounts, ind1, ind2 - ind1, '') acc, substring(accounts, ind1 + 1, ind2 - ind1 - 1) accounts
          from 
              cte
              cross apply (select charindex('#', accounts) ind1) ca
              cross apply (select charindex('#', accounts, ind1 + 1) ind2) cb
          union all
          select
              company, stuff(acc, ind1, ind2 - ind1, ''), substring(acc, ind1 + 1, ind2 - ind1 - 1)
          from
              rcte
              cross apply (select charindex('#', acc) ind1) ca
              cross apply (select charindex('#', acc, ind1 + 1) ind2) cb
          where
              len(acc)>1
      ) ,cte2 as (
      
          select company, accounts as  accounts_raw, Replace( accounts,'..','') as accounts,
              LEAD(accounts) OVER(Partition by company ORDER BY accounts) ld,
              ROW_NUMBER() OVER(ORDER BY accounts) rn 
          from rcte
      ) , cte3 as (
      
          Select company,accounts,ld ,rn 
          from cte2 
          WHERE ld not like '%..' 
      ) , cte4 as (
          select * from cte3 where accounts not in (select ld from cte3 t1 where t1.rn < cte3.rn)
      )
      
      SELECT company,accounts,ld from cte4
      UNION
      SELECT DISTINCT company,ld,NULL from cte3 where accounts not in (select accounts from cte4 t1)
      
      option (maxrecursion 0)
      

      结果:

      【讨论】:

        【解决方案4】:

        您似乎标记了 SSIS,因此我将使用脚本任务提供解决方案。所有其他示例都需要加载到临时表。

        1. 使用您的普通阅读器(可能是 Excel)并加载
        2. 添加脚本转换组件
        3. 编辑组件
        4. 输入列 - 检查公司和帐户
        5. 输入和输出 - 添加新输出并将其命名为 CompFirstLast
        6. 向其中添加三列 - 公司字符串、First int 和 Last int
        7. 打开脚本并粘贴以下代码

          public override void Input0_ProcessInputRow(Input0Buffer Row)
          {
          
          //Create an array for each group to create rows out of by splitting on '+'
          
          string[] SplitForRows = Row.Accounts.Split('+'); //Note single quotes denoting char 
          
          //Deal with each group and create the new Output
          for (int i = 0; i < SplitForRows.Length; i++) //Loop each split column
              {
                  CompFirstLastBuffer.AddRow();
                  CompFirstLastBuffer.Company = Row.Company; //This is static for each incoming row
          
                  //Clean up the string getting rid of (). and leaving a delimited list of #
                  string accts = SplitForRows[i].Replace("(", String.Empty).Replace(")", String.Empty).Replace(".", String.Empty).Substring(1);
          
                  //Split into Array
                  string[] accounts = accts.Split('#');
          
                  // Write out first and last and handle null
                  CompFirstLastBuffer.First = int.Parse(accounts[0]);
          
                  if (accounts.Length == 1)
                      CompFirstLastBuffer.Last_IsNull = true;
                  else
                      CompFirstLastBuffer.Last = int.Parse(accounts[1]);
          
              }
          }
          
        8. 确保使用正确的输出。

        【讨论】:

        • 如果您的列表更像 uzi 所说的 (#3001#3002#3003)+ 则更改最后一行 CompFirstLastBuffer.Last = int.Parse(accounts[1]);到 CompFirstLastBuffer.Last = int.Parse(accounts[accounts.length-1]);
        • 另外,您可能希望将两者都设置为 accounts[0] 并给出相同值的开始和结束,而不是仅返回 1 个值。我假设您将来会在逻辑之间做一些事情。
        • @Hadi... 感谢您清理它。我不知道如何让它看起来像代码
        • 枚举后的第一行需要一些额外的空间来像代码一样格式化。欢迎你:)
        猜你喜欢
        • 2021-07-02
        • 1970-01-01
        • 2012-06-13
        • 1970-01-01
        • 2016-11-02
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2021-06-23
        相关资源
        最近更新 更多