【问题标题】:Converting table with missing values to matrix of counts将具有缺失值的表转换为计数矩阵
【发布时间】:2019-07-19 10:23:53
【问题描述】:

我有一个表,每行中的元素数量不相等,每个元素的计数为 1 或 2 附加到字符串。我想创建一个每个字符串存在/不存在的矩阵,但包括计数 (1,2) 并在找不到字符串时放置零。

从这里:

  V1      V2      V3         V4      V5
1  A   cat:2   dog:1    mouse:1 horse:2
2  B   dog:2 mouse:2 dolphin:2        
3  C horse:2                           
4  D   cat:1 mouse:2  dolphin:2   

到这里:

  cat dog mouse horse dolphin
A 2 1 1 2 0
B 0 2 2 0 2
C 0 0 0 2 0
D 1 0 2 0 2

我查阅了以前类似问题的解决方案: Convert a dataframe to presence absence matrix

把他们创建一个 0/​​1 的缺席矩阵,不包括计数。

样本数据:

structure(list(V1 = c("A", "B", "C", "D"),
               V2 = c("cat:2", "dog:2", "horse:2", "cat:1"),
               V3 = c("dog:1", "mouse:2", "", "mouse:2"),
               V4 = c("mouse:1", "dolphin:2", "", "dolphin:2"),
               V5 = c("horse:2", "", "", "")),
               .Names = c("V1", "V2", "V3", "V4", "V5"),
               class = "data.frame", row.names = c(NA, -4L))

【问题讨论】:

    标签: r matrix count


    【解决方案1】:

    也许某些软件包可以使这更容易,但这里有一个解决方案。对于大数据,它不会很快,但它可以完成工作:

    #split the strings
    tmp <- apply(DF[,-1], 1, strsplit, ":")
    
    #extract the first strings
    names <- lapply(tmp,function(x)  c(na.omit(sapply(x, "[", 1))))
    uniquenames <- unique(unlist(names))
    
    #extract the numbers
    reps <- lapply(tmp,function(x)  as.numeric(na.omit(sapply(x, "[", 2))))
    
    #make the numbers named vectors
    res <- mapply(setNames, reps, names)
    
    #subset the named vectors and combine result in a matrix
    res <- do.call(rbind, lapply(res, "[",uniquenames))
    
    #cosmetics
    colnames(res) <- uniquenames
    rownames(res) <- DF$V1
    res[is.na(res)] <- 0
    #  cat dog mouse horse dolphin
    #A   2   1     1     2       0
    #B   0   2     2     0       2
    #C   0   0     0     2       0
    #D   1   0     2     0       2
    

    【讨论】:

      【解决方案2】:

      您可以在将数据熔化为长格式然后使用计数作为值进行宽泛转换(需要从字符转换为数字作为前一个步骤)。

      data %>% 
        melt("V1") %>% 
        separate(value, c("animal", "count"), ":", fill = "left") %>%  
        transform(count = as.numeric(count)) %>% 
        dcast(V1 ~ animal, value.var = "count", fun.aggregate = sum) %>% 
        select(-"NA")
      
      #   V1 cat dog dolphin horse mouse
      # 1  A   2   1       0     2     1
      # 2  B   0   2       2     0     2
      # 3  C   0   0       0     2     0
      # 4  D   1   0       2     0     2
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2014-04-29
        • 2021-10-16
        • 1970-01-01
        • 2014-01-14
        • 2019-04-11
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多