【问题标题】:Add/match rows with NA to matrix based on missing unique IDs根据缺少的唯一 ID 将具有 NA 的行添加/匹配到矩阵
【发布时间】:2022-12-18 08:18:40
【问题描述】:

我正在使用面板数据集并打算使用 SAOM 将其建模为动态从属网络。不幸的是,数据非常混乱,处理起来很痛苦。

我已经设法为每个面板波创建邻接矩阵。然而,随着时间的推移,该小组的规模越来越大/人们离开了。我需要每个矩阵中的行数根据检查 R 中的对象时出现的唯一 ID 相同且顺序相同。所有“添加的 ID”应在整行中显示 10s。

这是一个可重现的例子,它应该使问题变得清晰,并显示我的目标。我认为这可以通过巧妙地使用 merge() 函数来解决,但我无法让它工作:

wave1 <- matrix(c(0,0,1,1,0,1,1,0,1,1), nrow = 5, ncol = 2, dimnames = list(c("1","2","4","5","9"), c("group1","group2")))
wave2 <- matrix(c(0,1,1,0,1,0,1,1), nrow = 4, ncol = 2, dimnames = list(c("1","4","8","9"), c("group1","group2")))

wave1_c <- matrix(c(0,0,1,1,10,0,1,1,0,0,10,1), nrow = 6, ncol = 2, dimnames = list(c("1","2","4","5","8","9"), c("group1","group2")))
wave2_c <- matrix(c(0,10,1,10,1,0,1,10,0,10,1,1), nrow = 6, ncol = 2, dimnames = list(c("1","2","4","5","8","9"), c("group1","group2")))

提前致谢。除了 10 之外,矩阵中的数字是任意的。

【问题讨论】:

  • 注意 wave1_c["5",2] 应该是 1,==wave1["5", 2],而不是零。

标签: r matrix networking igraph siena


【解决方案1】:

我第一次尝试将 wave1,2 转换为数据帧是多余的,可以省略。但是以隐式强制为代价。

## merge wave1 and wave2 by row name.
m_df1_df2 <- merge(wave1, wave2, by = 0, all = TRUE)
rownames(m_df1_df2) <- m_df1_df2$Row.names

# rows not in set 1, but in set 2,
# rows not in set 2, but in set 1.
not1_2 <- m_df1_df2[is.na(m_df1_df2$group1.x),][c("group1.x", "group2.x")]
not2_1 <- m_df1_df2[is.na(m_df1_df2$group1.y),][c("group1.y", "group2.y")]

## Same column names. 
colnames(not1_2) <- colnames(wave1)
colnames(not2_1) <- colnames(wave2)

## append.
wave1_c <- rbind(wave1, not1_2)
wave2_c <- rbind(wave2, not2_1)

## order by row name.
wave1_c <- wave1_c[order(row.names(wave1_c)), ]
wave2_c <- wave2_c[order(row.names(wave2_c)), ]

## replace NA by 10.
wave1_c[is.na(wave1_c)] <- 10
wave2_c[is.na(wave2_c)] <- 10

## show result.
wave1_c
wave2_c

【讨论】:

    【解决方案2】:

    使用 setdiff 的解决方案。

    ## rownames not in set 1, but in set 2,
    ## rownames not in set 2, but in set 1.
    rn_not2_1 <- setdiff(rownames(wave1), rownames(wave2))
    rn_not1_2 <- setdiff(rownames(wave2), rownames(wave1))
    
    ## missing rows to add.
    add_to_1 <- wave2[rn_not1_2,,drop=FALSE]
    add_to_2 <- wave1[rn_not2_1,,drop=FALSE]
    add_to_1[,] <- 10
    add_to_2[,] <- 10
    
    ## append.
    wave1_c <- rbind(wave1, add_to_1)
    wave2_c <- rbind(wave2, add_to_2)
    
    ## order by row name.
    wave1_c <- wave1_c[order(row.names(wave1_c)), ]
    wave2_c <- wave2_c[order(row.names(wave2_c)), ]
    
    ## show result.
    wave1_c
    wave2_c
    

    【讨论】:

      【解决方案3】:

      使用数据帧和合并的基础 R 中的解决方案。

      更新。

      both <- merge(wave1, wave2, by = 'row.names', all = TRUE)
      

      输出。

         Row.names group1.x group2.x group1.y group2.y
       1         1        0        1        0        1
       2         2        0        1       NA       NA
       3         4        1        0        1        0
       4         5        1        1       NA       NA
       5         8       NA       NA        1        1
       6         9        0        1        0        1
      
      dwave1_c <- both[,2:3]; colnames(dwave1_c) <- colnames(wave1)
      dwave2_c <- both[,4:5]; colnames(dwave2_c) <- colnames(wave2)
      dwave1_c[is.na(dwave1_c)] <- 10
      dwave2_c[is.na(dwave2_c)] <- 10
      

      显示结果。

      as.matrix(dwave1_c)
      as.matrix(dwave2_c)
      

      第一次尝试。

      ## Convert matrix to dataframe.
      df1 <- as.data.frame(wave1)
      df2 <- as.data.frame(wave2)
      
      ## Merge df1 and df2 by row name.
      m_df1_df2 <- merge(df1, df2, by = 'row.names', all = TRUE)
      rownames(m_df1_df2) <- m_df1_df2$Row.names
      
      # Rows not in df1, but in df2,
      # rows not in df2, but in df1
      not1_2 <- m_df1_df2[is.na(m_df1_df2$group1.x),][c("group1.x", "group2.x")] # not in df1, in df2
      not2_1 <- m_df1_df2[is.na(m_df1_df2$group1.y),][c("group1.y", "group2.y")] # not in df2, in df1
      
      ## Same column names.   
      colnames(not1_2) <- colnames(df1)
      colnames(not2_1) <- colnames(df2)
      
      ## append
      df1_c <- rbind(df1, not1_2)
      df2_c <- rbind(df2, not2_1)
      
      ## order by row name
      df1_c <- df1_c[order(row.names(df1_c)), ]
      df2_c <- df2_c[order(row.names(df2_c)), ]
      
      ## replace NA by 10
      df1_c[is.na(df1_c)] <- 10
      df2_c[is.na(df2_c)] <- 10
      as.matrix(df1_c)
      as.matrix(df2_c)
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2016-01-23
        • 1970-01-01
        • 1970-01-01
        • 2023-03-16
        • 2022-11-13
        • 2021-05-02
        • 1970-01-01
        • 2021-11-24
        相关资源
        最近更新 更多