【发布时间】:2016-08-25 07:19:57
【问题描述】:
我正在寻找一种方法来为函数中的不同变量使用不同的尺度。
这是A simpler way to achieve a frequency count with mean, sum, length and sd in R的后续问题
给定
# create the summary function
summaryStatistics <- function(x,levels) {
xx <- na.omit(x)
c(table(factor(x, levels=levels), useNA='always', exclude=NULL),
sum=sum(xx),
length=length(x),
mean=mean(xx),
standard.deviation=sqrt(var(xx)),
var=(var(xx)),
median=median(xx),
min=min(xx),
max=max(xx),
quantile=quantile(xx),
skew=sum((xx-mean(xx))^3/sqrt(var(xx))^3)/length(x) ,
kurtosis=sum((xx-mean(xx))^4/sqrt(var(xx))^4)/length(x) - 3
)
}
# create the test data frame
Id <- c(1,2,3,4,5,6,7,8,9,10)
ClassA <- c(1,NA,3,1,1,2,1,4,5,3)
ClassB <- c(2,1,1,3,3,2,1,1,3,3)
R <- c(1,2,3,NA,9,2,4,5,6,7)
S <- c(3,7,NA,9,5,8,7,NA,7,6)
df <- data.frame(Id,ClassA,ClassB,R,S)
ClassAAnswers <- c(1:5,NA)
ClassBAnswers <- c(1:5,NA)
RAnswers <- c(0:10,NA);
SAnswers <- c(0:20,NA);
# create the result
result <- setNames(
nm=c('answer','question','value'),
as.data.frame(
as.table(
simplify2array(
lapply(
df[c('R', 'S')],
summaryStatistics,
RAnswers
)
)
)
)
)
# change the order to question, answer, value
result <- result[, c(2, 1, 3)]
# add the filter
result <- cbind(filter='None',result)
# return the result
result
我明白了
filter question answer value
1 None R 0 0.0000000
2 None R 1 1.0000000
3 None R 2 2.0000000
4 None R 3 1.0000000
5 None R 4 1.0000000
6 None R 5 1.0000000
7 None R 6 1.0000000
8 None R 7 1.0000000
9 None R 8 0.0000000
10 None R 9 1.0000000
11 None R 10 0.0000000
12 None R <NA> 1.0000000
13 None R sum 39.0000000
14 None R length 10.0000000
15 None R mean 4.3333333
16 None R standard.deviation 2.6457513
17 None R var 7.0000000
18 None R median 4.0000000
19 None R min 1.0000000
20 None R max 9.0000000
21 None R quantile.0% 1.0000000
22 None R quantile.25% 2.0000000
23 None R quantile.50% 4.0000000
24 None R quantile.75% 6.0000000
25 None R quantile.100% 9.0000000
26 None R skew 0.3275692
27 None R kurtosis -1.5333333
28 None S 0 0.0000000
29 None S 1 0.0000000
30 None S 2 0.0000000
31 None S 3 1.0000000
32 None S 4 0.0000000
33 None S 5 1.0000000
34 None S 6 1.0000000
35 None S 7 3.0000000
36 None S 8 1.0000000
37 None S 9 1.0000000
38 None S 10 0.0000000
39 None S <NA> 2.0000000
40 None S sum 52.0000000
41 None S length 10.0000000
42 None S mean 6.5000000
43 None S standard.deviation 1.8516402
44 None S var 3.4285714
45 None S median 7.0000000
46 None S min 3.0000000
47 None S max 9.0000000
48 None S quantile.0% 3.0000000
49 None S quantile.25% 5.7500000
50 None S quantile.50% 7.0000000
51 None S quantile.75% 7.2500000
52 None S quantile.100% 9.0000000
53 None S skew -0.4252986
54 None S kurtosis -1.3028646
S 的答案从 0 到 10。
我认为关键是 lapply。
lapply(df[c('R', 'S')], summaryStatistics, c(0:20))
为 R 和 S 生成从 0 到 20 的结果。
lapply(df[c('R', 'S')], summaryStatistics, c(0:10))
为 R 和 S 生成从 0 到 10 的结果。
lapply(df[c('R', 'S')], summaryStatistics, c(0:20,0:10))
在第一个比例中给出结果,在第二个比例中没有给出结果,并带有一些警告。
警告信息:
1: 在levels<-(*tmp*, value = if (nl == nL) as.character(labels) else paste0(labels, :
不推荐使用重复的因子级别
2: 在levels<-(*tmp*, value = if (nl == nL) as.character(labels) else paste0(labels, :
不推荐使用重复的因子级别
3: 在levels<-(*tmp*, value = if (nl == nL) as.character(labels) else paste0(labels, :
不推荐使用重复的因子级别
4: 在levels<-(*tmp*, value = if (nl == nL) as.character(labels) else paste0(labels, :
不推荐使用重复的因子级别
我将如何更改汇总函数,以便我可以传入 R 的标度和 S 的标度并为每个变量获取一组标度结果?
【问题讨论】:
标签: r