【发布时间】:2019-09-24 08:32:27
【问题描述】:
我有一些数据需要分成 12 个左右不同的组,没有键,数据的顺序很重要。
数据具有多个组,并且这些组在其中具有单数和/或嵌套组。由于数据采用分层格式,因此每个组都将被拆分。所以每个“GROUP”都有自己的格式,然后都需要连接到一行(或多行)行。
示例数据文件:
"TRANS","23115168","","","OTVST","","23115168","","COMLT","","",20180216,"OAMI","501928",,
"MTPNT","UPDTE",2415799999,"","","17","","",,20180216,
"ASSET","","REPRT","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","REMVE","METER","","CR","E6VG470","LPG",2017,"E6S05633099999","","","LI"
"METER","","U","S1",6.0000,"","",20171108,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00990"
"ASSET","","INSTL","METER","","CR","E6VG470","LPG",2017,"E6S06769699999","","","LI"
"METER","","U","S1",6.0000,"","",20180216,"S",,
"REGST","","METER",5,"SCMH",1.000
"READG",20180216,,"00000"
"APPNT","",20180216,,"","123900",""
输入数据时应存在的层次结构。我在想以后可能会有几张桌子可以连接在一起。 (用于说明父子级别的数字)
1. Transaction [TRANS]
1.1. Meter Point [MTPNT]
1.1.1. Asset [ASSET]
1.1.1.1. Meter [METER]
1.1.1.2. Converter [CONVE]
1.1.1.3. Register Details [REGST]
1.1.1.3.1. Reading [READG]
1.1.1.4. Market Participant [MKPRT]
1.1.1.5. Name [NAME]
1.1.1.5.1. Address [ADDRS]
1.1.1.5.2. Contact Mechanism [CONTM]
1.2. Appointment [APPNT]
1.3. Name [NAME]
1.3.1. Address [ADDRS]
1.3.2. Contact Mechanism [CONTM]
1.4. Market Participant [MKPRT]
行业 GAS 数据,因此在此流程中,每个 MTPNT 可以有很多资产,而这些很多资产可以有很多 REGST,因为这是为 READG 保存仪表读数的地方
我尝试过按组使用并首先进行迭代。处理,但我以前没有处理过这种类型的数据。我需要一种方法来拆分为每个分组创建一个键,当拆分并定义字段时,可以重新连接在一起。
我已经尝试操作 infile 以便所有数据都显示在每个 TRANS 的一行上,但是我仍然遇到应用字段的问题,并且排序是最重要的。
我已经设法为一些组获取了一些密钥,但在拆分后它们并没有完全重新组合在一起。
data TRANS;
set mpancreate_a;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "TRANS" then
TRANSKey+1;
end;
run;
data TRANS;
set TRANS;
TRANSKey2 + 1;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "TRANS" then
TRANSKEY2=1;
end;
run;
data MTPNT;
set TRANS;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "MTPNT" then
MTPNTKEY+1;
end;
run;
data MTPNT;
set MTPNT;
by MTPNTKEY NOTSORTED;
if first.MTPNTKEY and DataItmGrp = "MTPNT" then
MTPNTKEY2=0;
MTPNTKEY2+1;
run;
data ASSET;
set MTPNT;
IF MTPNTKEY = 0 THEN
MTPNTKEY2=0;
by DataItmGrp NOTSORTED;
if first.DataItmGrp then
do;
if DataItmGrp = "ASSET" then
ASSETKEY+1;
end;
run;
data ASSET;
set ASSET;
by ASSETKEY NOTSORTED;
if first.ASSETKEY and DataItmGrp = "ASSET" then
ASSETKEY2=0;
ASSETKEY2+1;
IF ASSETKEY =0 THEN
ASSETKEY2=0;
run;
我希望为找到的每个组提供一个计数器,并为该特定组保留一个计数器 - 但我无法根据上面的层次结构确定如何进出分组
我希望一旦我有了这些键,我就可以按组拆分数据,然后再将数据重新组合在一起
_n_ TRANS TRANS2 MTPNT MTPNT2
TRANS 1 1 0 0 0
MTPNT 2 2 1 1 1
ASSET 3 3 1 2 1
METER 4 4 1 3 1
READG 5 5 1 4 1
MTPNT 6 6 1 1 2
ASSET 7 7 1 2 2
METER 8 8 1 3 2
READG 9 9 1 4 2
APPNT 10 10 1 5 2
TRANS 11 1 2 6 2
MTPNT 12 2 2 1 3
ASSET 13 3 2 2 3
METER 14 4 2 3 3
READG 15 5 2 4 3
MTPNT 16 6 2 1 4
ASSET 17 7 2 2 4
METER 18 8 2 3 4
READG 19 9 2 4 4
APPNT 20 10 2 5 4
【问题讨论】:
-
为什么不将上面的数字分配给代码。然后,您可以通过获取变量的特定子字符串在任何级别进行聚合?
-
实际的数据文件是什么样的?因为某些层次结构可能出现在不同的层(例如点 1.1.1.5 和 1.3 处的 NAME/ADDRS 和 NAME/CONTM),您可能必须具有具有节点标识或值所有权链接的通用值属性类型的表。需要有规则来知道(在读取数据时)值属于哪个级别(或层)(例如空格缩进或数据文件中的特殊标记字符)
-
上面添加的数据。
-
@Richard 你的评论是正确的,考虑到我的大部分 ETL 都是用股票 DI Studio 完成的,这就是我要解开的原因
标签: sas iterator hierarchy flat-file enumerate