【发布时间】:2020-06-06 03:42:05
【问题描述】:
是否可以在 SAS 中获取整个表的频率?例如,我想计算整个表格中有多少是或否?谢谢
【问题讨论】:
-
是否有多个具有
Y和N值的列?编码为1的值是否表示Y和0表示N -
有多个列具有相同的类别,字符串值。
是否可以在 SAS 中获取整个表的频率?例如,我想计算整个表格中有多少是或否?谢谢
【问题讨论】:
Y 和N 值的列?编码为1 的值是否表示Y 和0 表示N
hash 组件对象具有键并且可以跟踪在实例化时提供的keysum: 标记属性指定的键摘要变量中的.FIND 引用。 keysum 变量在每个 suminc: 变量增加 1 时将计算频率计数。
data have;
* Words array from Abstract;
* "How Do I Love Hash Tables? Let Me Count The Ways!";
* by Judy Loren, Health Dialog Analytic Solutions;
* SGF 2008 - Beyond the Basics;
* https://support.sas.com/resources/papers/proceedings/pdfs/sgf2008/029-2008.pdf;
array words(17) $10 _temporary_ (
'I' 'love' 'hash' 'tables'
'You' 'will' 'too' 'after' 'you' 'see'
'what' 'they' 'can' 'do' '--' 'Judy' 'Loren'
);
call streaminit(123);
do row = 1 to 127;
attrib RESPONSE1-RESPONSE20 length = $10;
array RESPONSE RESPONSE1-RESPONSE20;
do over RESPONSE;
RESPONSE = words(rand('integer', 1, dim(words)));
end;
output;
end;
run;
data _null_;
set have;
if _n_ = 1 then do;
length term $10;
call missing (term);
retain one 1;
retain count 0;
declare hash bins(suminc:'one', keysum:'count');
bins.defineKey('term');
bins.defineData('term');
bins.defineDone();
end;
set have end=lastrow;
array response response1-response20;
do over response;
if bins.find(key:response) ne 0 then do;
bins.add(key:response, data:response, data:1);
end;
end;
if lastrow;
bins.output(dataset:'all_freq');
run;
是的。您可以排列值,为每个 No/Yes 值计算为 0/1 标志,然后使用 SUM 计算 0 和 1。 SUM 仅在处理 0 和 1 时计算 FREQ。
例子:
data have;
call streaminit(123);
do row = 1 to 100;
attrib ANSWER1-ANSWER20 length = $3;
array ANSWER ANSWER1-ANSWER20;
do over ANSWER; ANSWER = ifc(rand('uniform') > 0.15,'Yes','No'); end;
output;
end;
run;
data want(keep=freq_1 freq_0);
set have end=lastrow;
array ANSWER ANSWER1-ANSWER20;
array X(20) _temporary_;
do over ANSWER; x(_I_) = ANSWER = 'Yes'; end;
freq_1 + sum (of X(*));
freq_0 + dim(X) - sum (of X(*));
if lastrow;
run;
【讨论】:
转置您的主要数据,然后执行 proc freq。这是完全动态的,并且可以根据问题的数量或响应的数量进行扩展。您确实需要使所有变量都具有相同的类型 - 字符或数字。
*generate fake data;
data have;
call streaminit(99);
array q(30) q1-q30;
do i=1 to 100;
do j=1 to dim(q);
q(j) = rand('bernoulli', 0.8);
end;
output;
end;
run;
*flip it to a long format;
proc transpose data=have out=long;
by I;
var q1-q30;
run;
*get the summaries needed;
proc freq data=long;
table col1;
run;
你应该得到如下输出:
The FREQ Procedure
COL1 Frequency Percent Cumulative
Frequency Cumulative
Percent
0 581 19.37 581 19.37
1 2419 80.63 3000 100.00
【讨论】: