SAS数据步骤/ proc sql使用自动增量主键从另一个表中插入行答案

【问题标题】：SAS data step/ proc sql insert rows from another table with auto increment primary keySAS数据步骤/ proc sql使用自动增量主键从另一个表中插入行
【发布时间】：2013-03-10 23:06:05
【问题描述】：

我有如下 2 个数据集

id name status 
1  A    a
2  B    b
3  C    c

另一个数据集

name status new
C    c      0
D    d      1
E    e      1
F    f      1

如何将第二个表中的所有行插入到第一个表中？情况是第一个表是永久性的。第二张表每月更新一次，所以我想将每月更新表中的所有行添加到永久表中，使其看起来像这样

id name status
1  A    a
2  B    b
3  C    c
4  D    d
5  E    e
6  F    f

我面临的问题是我无法从数据集 1 中递增 id。据我搜索，SAS 中的数据集没有自动递增属性。自动增量可以通过使用数据步骤来完成，但我不知道数据步骤是否可以用于像这样的 2 个表的情况。通常的sql是

Insert into table1 (name, status) 
select name, status from table2 where new = 1;

但是由于 sas 数据集不支持自动增量列，因此我面临的问题。我可以在上面的proc sql之后使用SAS数据步骤来解决它

data table1;
set table1;
if _n_ > 3 then id = _n_;
run;

这会增加id列的值，但是代码有点丑，而且id是主键，在其他表中被用作外键，所以我不想弄乱ids旧行数。

我正在学习和使用 SAS，因此非常感谢您的帮助。提前致谢。

补充问题：如果第二个表没有新列，有没有办法用数据步骤完成我想要的（从月表（第二个）到永久表（第一个）添加新行）？目前，我使用这个丑陋的 proc sql/data 步骤来创建新列

proc sql; //create a temp table from table2
create t2temp as select t2.*, 
(case when t2.name = t1.name and t2.status = t1.status then 0 else 1) as new
from table2 as t2 
left join table1 as t1
on t2.name = t1.name and t2.status = t1.status;
drop table t2; //drop the old table2 with no column "new"
quit;
data table2;  //rename the t2temp as table2
set t2temp;
run;

【问题讨论】：

标签： sas proc-sql datastep

【解决方案1】：

您可以在数据步中执行此操作。顺便说一句，如果你完全重新创建它，你可以使用

id+1;

创建一个自动编号的字段（假设您的数据步骤不太复杂）。这将跟踪当前最高的 ID 号，如果它在新数据集中，则为每一行分配一个更高的 ID。

data have;
input id name $ status $;
datalines;
2  A    a
3  B    b
1  C    c
;;;;
run;

data addon;
input name $ status $ new;
datalines;
C    c      0
D    d      1
E    e      1
F    f      1
;;;;
run;

data want;
retain _maxID;                    *keep the value of _maxID from one row to the next, 
                                   do not reset it;
set have(in=old) addon(in=add);   *in= creates a temporary variable indicating which 
                                   dataset a row came from;
if (old) or (add and new);        *in SAS like in c/etc., 0/missing(null) is 
                                   false negative/positive numbers are true;
if add then ID = _maxID+1;        *assigns ID to the new records;
_maxID = max(id,_maxID);          *determines the new maximum ID - 
                                   this structure guarantees it works 
                                   even if the old DS is not sorted;
put id= name=;
drop _maxID;
run;

回答第二个问题：

是的，您仍然可以这样做。最简单的方法之一是，如果您有按名称排序的数据集：

data want;
retain _maxID;
set have(in=old) addon(in=add);
by name;
if (old) or (add and first.name);
if add then ID = _maxID+1;
_maxID = max(id,_maxID);
put id= name=;
run;

first.name 对于具有相同值name 的第一条记录将为真；因此，如果 HAVE 具有该名称的值，则不允许 ADDON 添加新记录。

这确实要求name 在 HAVE 中是唯一的，否则您可能会删除一些记录。如果不是这样，那么您有一个更复杂的解决方案。

【讨论】：

再次感谢乔，另一个很好的答案。我将更多地研究 (in=old) 和 (in=add)。我可以稍微扩展一下这个问题吗？额外的问题放在主要问题上，以便于阅读。再次感谢乔。
嗨，乔，感谢您的回答。我很抱歉没有说清楚。名称不是唯一的。表中唯一独特的是名称和状态的组合。例如，名称 A 和状态 a 是唯一的（这 2 个值仅存在 1 行），名称 A 和状态 b 是唯一的，等等。
如果名称/状态组合是唯一的，那么您仍然可以这样做，只需使用“按名称状态；”和“first.status”。