【问题标题】:SAS group by counters per variable - primary key creationSAS 按每个变量的计数器分组 - 主键创建
【发布时间】:2019-09-24 08:32:27
【问题描述】:

我有一些数据需要分成 12 个左右不同的组,没有键,数据的顺序很重要。

数据具有多个组,并且这些组在其中具有单数和/或嵌套组。由于数据采用分层格式,因此每个组都将被拆分。所以每个“GROUP”都有自己的格式,然后都需要连接到一行(或多行)行。

示例数据文件:

"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""

输入数据时应存在的层次结构。我在想以后可能会有几张桌子可以连接在一起。 (用于说明父子级别的数字)

1. Transaction [TRANS]
   1.1. Meter Point [MTPNT]
      1.1.1. Asset [ASSET]
         1.1.1.1. Meter [METER]
         1.1.1.2. Converter [CONVE]
         1.1.1.3. Register Details [REGST]
            1.1.1.3.1. Reading [READG]
         1.1.1.4. Market Participant [MKPRT]
         1.1.1.5. Name [NAME]
            1.1.1.5.1. Address [ADDRS]
            1.1.1.5.2. Contact Mechanism [CONTM]
   1.2. Appointment [APPNT]
   1.3. Name [NAME]
      1.3.1. Address [ADDRS]
      1.3.2. Contact Mechanism [CONTM]
   1.4. Market Participant [MKPRT]

行业 GAS 数据,因此在此流程中,每个 MTPNT 可以有很多资产,而这些很多资产可以有很多 REGST,因为这是为 READG 保存仪表读数的地方

我尝试过按组使用并首先进行迭代。处理,但我以前没有处理过这种类型的数据。我需要一种方法来拆分为每个分组创建一个键,当拆分并定义字段时,可以重新连接在一起。

我已经尝试操作 infile 以便所有数据都显示在每个 TRANS 的一行上,但是我仍然遇到应用字段的问题,并且排序是最重要的。

我已经设法为一些组获取了一些密钥,但在拆分后它们并没有完全重新组合在一起。

data TRANS;
    set mpancreate_a;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then 
        do;
            if DataItmGrp = "TRANS" then 
                TRANSKey+1;
        end;
run;

data TRANS;
    set TRANS;
    TRANSKey2 + 1;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "TRANS" then
                TRANSKEY2=1;
        end;


run;

data MTPNT;
    set TRANS;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "MTPNT" then
                MTPNTKEY+1;
        end;
run;

data MTPNT;
    set MTPNT;
    by  MTPNTKEY NOTSORTED;

    if first.MTPNTKEY  and DataItmGrp = "MTPNT" then
        MTPNTKEY2=0;
    MTPNTKEY2+1;
run;

data ASSET;
    set MTPNT;

    IF MTPNTKEY = 0 THEN
        MTPNTKEY2=0;
    by DataItmGrp NOTSORTED;

    if first.DataItmGrp then
        do;
            if DataItmGrp = "ASSET" then
                ASSETKEY+1;
        end;
run;

data ASSET;
    set ASSET;
    by  ASSETKEY NOTSORTED;

    if first.ASSETKEY  and DataItmGrp = "ASSET" then
        ASSETKEY2=0;
    ASSETKEY2+1;

    IF ASSETKEY =0 THEN
        ASSETKEY2=0;
run;

我希望为找到的每个组提供一个计数器,并为该特定组保留一个计数器 - 但我无法根据上面的层次结构确定如何进出分组

我希望一旦我有了这些键,我就可以按组拆分数据,然后再将数据重新组合在一起


        _n_     TRANS   TRANS2  MTPNT   MTPNT2
TRANS   1       1       0       0       0
MTPNT   2       2       1       1       1
ASSET   3       3       1       2       1
METER   4       4       1       3       1
READG   5       5       1       4       1
MTPNT   6       6       1       1       2
ASSET   7       7       1       2       2
METER   8       8       1       3       2
READG   9       9       1       4       2
APPNT   10      10      1       5       2
TRANS   11      1       2       6       2
MTPNT   12      2       2       1       3
ASSET   13      3       2       2       3
METER   14      4       2       3       3
READG   15      5       2       4       3
MTPNT   16      6       2       1       4
ASSET   17      7       2       2       4
METER   18      8       2       3       4
READG   19      9       2       4       4
APPNT   20      10      2       5       4   




【问题讨论】:

  • 为什么不将上面的数字分配给代码。然后,您可以通过获取变量的特定子字符串在任何级别进行聚合?
  • 实际的数据文件是什么样的?因为某些层次结构可能出现在不同的层(例如点 1.1.1.5 和 1.3 处的 NAME/ADDRS 和 NAME/CONTM),您可能必须具有具有节点标识或值所有权链接的通用值属性类型的表。需要有规则来知道(在读取数据时)值属于哪个级别(或层)(例如空格缩进或数据文件中的特殊标记字符)
  • 上面添加的数据。
  • @Richard 你的评论是正确的,考虑到我的大部分 ETL 都是用股票 DI Studio 完成的,这就是我要解开的原因

标签: sas iterator hierarchy flat-file enumerate


【解决方案1】:

从没有明确标记的数据文件输入分层数据是有问题的。我的最佳建议是了解您想要提取的突出价值是什么,以及您想在什么背景下了解它们。对于这个问题,最简单的第一种方法是使用带有分类变量的单个整体表来捕获下降到显着值(仪表读数)的路径。

更复杂的情况是每行中的第一个标记驱动该行的输入和它所属的输出表。由于没有关于层次结构绝对或相对位置的界标(如在 NAME 和 MKPRT 中),因此没有 100% 可靠的方式将它们放置在层次结构中,这也会影响从后续数据行读取的项目的放置。

根据真实世界中真正的复杂性和对规则的遵守情况,您可能会或可能不会“错过”某些值的读取。

假设有一个更简单的目标是获取仪表读数。

data want;

length tier level1-level6 $8 path $64 meterReadingString $8 dummy $1;
retain level1-level5 path;
attrib readingdate informat=yymmdd10. format=yymmdd10.;

infile cards dsd missover;

input @1 tier @; * held input - dont advance read line yet;

if tier="TRANS" then do;
  level1 = tier;
  call missing (of level2-level6);
  path = catx("/", of level:);
end;

if tier="MTPNT" and path="TRANS" then do;
  level2 = tier;
  call missing (of level3-level6);
  path = catx("/", of level:);
end;

if tier="ASSET" and path="TRANS/MTPNT" then do;
  level3 = tier;
  call missing (of level4-level6);
  path = catx("/", of level:);
end;

if tier="METER" and path="TRANS/MTPNT/ASSET" then do;
  level4 = tier;
  call missing (of level5-level6);
  path = catx("/", of level:);
end;

if tier="REGST" and path="TRANS/MTPNT/ASSET/METER" then do;
  level5 = tier;
  call missing (of level6-level6);
  path = catx("/", of level:);
end;

if tier="READG" and path="TRANS/MTPNT/ASSET/METER/REGST" then do;
  level6 = tier;
  path = catx("/", of level:);
  input @1 tier readingdate dummy meterReadingString @; * reread line according to tier;

  meterReading = input(meterReadingString, best12.);

  if path = "TRANS/MTPNT/ASSET/METER/REGST/READG" then OUTPUT;
end;    

datalines;
"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""
run;

您可以将其用作更复杂的阅读器的基础,该阅读器具有不同的output <tier> 数据集,用于每个层或遇到的层路径。每层需要不同的input 语句,类似于读取READG 的方式。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2022-08-04
    • 2013-08-31
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-03-07
    • 1970-01-01
    相关资源
    最近更新 更多