【发布时间】:2022-01-26 20:27:57
【问题描述】:
我正在使用 R 编程语言。
在以下链接 (https://www.geeksforgeeks.org/how-to-find-the-percentage-of-missing-values-in-a-dataframe-in-r/) 中,我找到了一种计算数据框中 NA 总百分比的方法:
# declaring a data frame in R
data_frame = data.frame(C1= c(1, 2, NA, 0),
C2= c( NA, NA, 3, 8),
C3= c("A", "V", "j", "y"),
C4=c(NA,NA,NA,NA))
percentage = mean(is.na(data_frame)) * 100
[1] 43.75
我的问题:有没有办法扩展它来计算数据框中“任何元素”的百分比?
例如,这可以用来计算数据集中0的百分比吗?或者“j”出现在数据中的百分比?还是“2”出现在数据集中的百分比?
我可以手动完成:
# count percentage of "j" in the data
v1 = nrow(subset(data_frame, C1 == "j"))
v2 = nrow(subset(data_frame, C2 == "j"))
v3 = nrow(subset(data_frame, C3== "j"))
v4 = nrow(subset(data_frame, C4 == "j"))
percentage = ((v1 + v2 + v3 + v4) / ((nrow(data_frame) * ncol(data_frame)))) * 100
[1] 6.25
# count percentage of "0" in the data (I don't think this is right, it should be written as "nrow(subset(data_frame, C1 <= 0))"?)
v1 = nrow(subset(data_frame, C1 = 0))
v2 = nrow(subset(data_frame, C2 = 0))
v3 = nrow(subset(data_frame, C3= 0))
v4 = nrow(subset(data_frame, C4 = 0))
percentage = ((v1 + v2 + v3 + v4) / ((nrow(data_frame) * ncol(data_frame)))) * 100
但是有没有更快的方法来做到这一点?
谢谢!
【问题讨论】:
标签: r count data-manipulation na