【问题标题】:R: Counting Overall Percentage of 0's in DataR:计算数据中的0个百分比
【发布时间】:2022-01-26 20:27:57
【问题描述】:

我正在使用 R 编程语言。

在以下链接 (https://www.geeksforgeeks.org/how-to-find-the-percentage-of-missing-values-in-a-dataframe-in-r/) 中,我找到了一种计算数据框中 NA 总百分比的方法:

# declaring a data frame in R
data_frame = data.frame(C1= c(1, 2, NA, 0),
                        C2= c( NA, NA, 3, 8), 
                        C3= c("A", "V", "j", "y"),
                        C4=c(NA,NA,NA,NA))
  
percentage = mean(is.na(data_frame)) * 100

[1] 43.75

我的问题:有没有办法扩展它来计算数据框中“任何元素”的百分比?

例如,这可以用来计算数据集中0的百分比吗?或者“j”出现在数据中的百分比?还是“2”出现在数据集中的百分比?

我可以手动完成:

# count percentage of "j" in the data 

v1 = nrow(subset(data_frame, C1 == "j")) 
v2 = nrow(subset(data_frame, C2 == "j"))
v3 = nrow(subset(data_frame, C3== "j")) 
v4 = nrow(subset(data_frame, C4 == "j"))

percentage = ((v1 + v2 + v3 + v4) / ((nrow(data_frame) * ncol(data_frame)))) * 100

[1] 6.25

# count percentage of "0" in the data  (I don't think this is right, it should be written as "nrow(subset(data_frame, C1 <= 0))"?)

v1 = nrow(subset(data_frame, C1 = 0)) 
v2 = nrow(subset(data_frame, C2 = 0))
v3 = nrow(subset(data_frame, C3= 0)) 
v4 = nrow(subset(data_frame, C4 = 0))

percentage = ((v1 + v2 + v3 + v4) / ((nrow(data_frame) * ncol(data_frame)))) * 100

但是有没有更快的方法来做到这一点?

谢谢!

【问题讨论】:

    标签: r count data-manipulation na


    【解决方案1】:

    你可以尝试将unlist你的数据框变成一个向量

    vec = unlist(data_frame)
    
    mean(vec %in% "j") * 100 # 6.25
    mean(vec %in% "0") * 100 # 6.25
    mean(vec %in% NA)  * 100 # 43.75
    

    【讨论】:

      【解决方案2】:

      假设数据框的单元格中没有嵌入列表,则不必将其取消列出:

      data_frame = data.frame(C1= c(1, 2, NA, 0),
                               C2= c( NA, NA, 3, 8), 
                               C3= c("A", "V", "j", "y"),
                               C4=c(NA,NA,NA,NA))
       
      sum(data_frame == 'j', na.rm = TRUE) / prod(dim(data_frame)) * 100
      [1] 6.25
      
      sum(data_frame == 0, na.rm = TRUE) / prod(dim(data_frame)) * 100
      [1] 6.25
      
      sum(is.na(data_frame)) / prod(dim(data_frame)) * 100
      [1] 43.75
      

      【讨论】:

        【解决方案3】:

        这是tidyverse + base R 解决方案。

        library(tidyverse)
        
        data_frame %>%
          mutate(across(everything(), ~ .x %in% "j")) %>%
          unlist() %>%
          mean() * 100
        

        输出

        [1] 6.25
        

        虽然这很容易变成一个函数。

        calc <- function(df, val) {
          df %>%
            mutate(across(everything(), ~ .x %in% val)) %>%
            unlist() %>%
            mean() * 100
        }
        

        输出

        calc(data_frame, "j") # 6.25
        calc(data_frame, "0") # 6.25
        calc(data_frame, NA) # 43.75
        

        【讨论】:

        猜你喜欢
        • 2012-09-16
        • 1970-01-01
        • 2013-01-14
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多