【问题标题】:Apply multiple functions to a list of matrices and output answers in a data frame将多个函数应用于矩阵列表并在数据框中输出答案
【发布时间】:2020-01-21 19:17:23
【问题描述】:

我有以下矩阵:

mat<- matrix(c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,
   2,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,
   0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,
   0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,
   0,0,0,0,1,0,0,1,0,1,1,0,0,1,0,1,
   1,1,0,0,0,0,0,0,1,0,1,2,1,0,0,0), nrow=16, ncol=6)
dimnames(mat)<- list(c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"), 
          c("1", "2", "3", "4", "5", "6"))

我使用下面的函数创建了一个矩阵列表:

lapply(seq_len(ncol(mat) - 1), function(j) do.call(cbind, 
       lapply(seq_len(ncol(mat) - j), function(i) rowSums(mat[, i:(i + j)]))))

在此函数中,原始矩阵中的列使用移动窗口方法进行组合。首先,窗口大小为 2,以便合并两列中的数据。窗口移动 1 步(1 列),然后组合下一组两列。输出是每个窗口大小的矩阵。窗口大小继续增加,窗口增加到 3 列,3 列的结果输出到新矩阵中。这种情况一直持续到窗口大小达到最大列数为止。

我需要在列表中的每个矩阵上运行一系列函数并将答案输出到数据框中。我需要申请的功能是:

  1. 计算每行的总频率(即行总数)。我尝试了这个功能:

    freq <- rowSums(mat[i:(i + j),])
    
  2. 计算每行的平均频率(即行总数/行长)。我尝试了这个功能:

    mean_freq <- rowSums(mat[i:(i + j),])/length(mat[i:(i + j),])
    
  3. 乘以窗口大小 * pi * 25。

    total_window_size <- length(ncol(mat) - j))*pi*25
    
  4. 将每行的平均频率除以总窗口大小。

    density <- mean_freq/total_window_size
    

以下是此示例列表中每个矩阵的上述函数的预期结果(即result_mat1result_mat2...)。数据框result_df结合了每个子数据框的所有结果,是我需要的最终输出:

窗口大小为 2 的 df

result_mat1 <- data.frame( window_size= rep("2",80), 
                     combined_cols= c(rep("1_2",16), rep("2_3",16), rep("3_4",16), rep("4_5",16), rep("5_6",16)),
                     row_names= c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
                     freq=c(6,3,2,2,6,2,1,2,1,2,3,2,1,2,3,2),
                     mean_freq=(c(6,3,2,2,6,2,1,2,1,2,3,2,1,2,3,2)/5), 
                     total_window_size= rep(157.08, 16))
result_mat1$density<- result_mat1$mean_freq/result_mat1$total_window_size             

窗口大小为 3 的 df

result_mat2 <- data.frame( window_size= rep("3",64), 
                     combined_cols= c(rep("1_2_3",16), rep("2_3_4",16), rep("3_4_5",16), rep("4_5_6",16)),
                     row_names= c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
                     freq=c(6,4,3,3,7,3,1,2,1,2,3,2,1,2,4,2),
                     mean_freq=(c(6,4,3,3,7,3,1,2,1,2,3,2,1,2,4,2)/5), 
                     total_window_size= rep(235.62, 16))
result_mat2$density <- result_mat2$mean_freq/result_mat2$total_window_size

窗口大小为 4 的 df

result_mat3 <- data.frame( window_size= rep("4",48), 
                                 combined_cols= c(rep("1_2_3_4",16), rep("2_3_4_5",16), rep("3_4_5_6",16)),
                                 row_names= c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
                                 freq=c(6,3,3,3,7,3,1,2,1,2,3,2,1,2,4,2),
                                 mean_freq=(c(6,3,3,3,7,3,1,2,1,2,3,2,1,2,4,2)/5), 
                                 total_window_size= rep(314, 16))
result_mat3$density <- result_mat3$mean_freq/result_mat3$total_window_size

窗口大小为 5 的 df

result_mat4 <- data.frame( window_size= rep("5",32), 
                      combined_cols= c(rep("1_2_3_4_5",16), rep("2_3_4_5_6",16)),
                      row_names= c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
                      freq=c(6,3,2,2,6,2,1,2,1,2,3,2,1,2,4,2),
                      mean_freq=(c(6,3,2,2,6,2,1,2,1,2,3,2,1,2,4,2)/5), 
                      total_window_size= rep(392.5, 16))
result_mat4$density <- result_mat4$mean_freq/result_mat4$total_window_size

窗口大小为 6 的 df

result_mat5 <- data.frame( window_size= rep("6",16), 
                      combined_cols= c(rep("1_2_3_4_5_6",16)),
                      row_names= c("a", "c", "f", "h", "i", "j", "l", "m", "p", "q", "s", "t", "u", "v","x", "z"),
                      freq=c(4,2,1,1,3,1,1,1,1,1,2,2,1,1,3,1),
                      mean_freq=(c(4,2,1,1,3,1,1,1,1,1,2,2,1,1,3,1)/5), 
                      total_window_size= rep(471, 16))
result_mat5$density <- result_mat5$mean_freq/result_mat5$total_window_size

包含所有子数据帧结果的最终数据帧

result_df <- rbind(result_mat1, result_mat2, result_mat3, result_mat4, result_mat5)    

我需要帮助将这 4 个函数应用于列表的每个元素并将结果输出到一个数据框。

【问题讨论】:

  • 我有点困惑。你有这个函数lapply(seq_len(ncol(mat) - 1), function(j) do.call(cbind, lapply(seq_len(ncol(mat) - j), function(i) rowSums(mat[, i:(i + j)])))) 然后你通过拆分它来展示它
  • 我不想拆分列表。我想在列表的每个元素上应用一组函数来生成result_df。我为每个元素显示的单个 dfs(即result_mat1result_mat2)是我产生我需要的最终输出的中间尝试result_df
  • @Danielle 我很乐意提供帮助,但这个问题有点令人困惑。考虑用更小的数据集分解成更小的部分。为什么知道矩阵列表是如何获得的如此重要?

标签: r list function matrix apply


【解决方案1】:

这是一个开始。我不确定应该如何将 combined_cols 添加到每个 data.frame,因为它的大小不同(似乎每个列都比所有其他 data.frame 列长。)我不确定是否所有这些计算是完全正确的,但这至少说明了问题的症结在于“如何遍历列表,将一些 data.frames 与结果组合起来,并将它们合并到一个大 data.frame 中。)

myList <- lapply(seq_len(ncol(mat) - 1), function(j) do.call(cbind, lapply(seq_len(ncol(mat) - j), function(i) rowSums(mat[, i:(i + j)]))))
myListOutput <- list()

for (i in 1:length(myList)) {
  print(i)
  myMat = myList[[i]]

  freq <- rowSums(myMat)
  window_size = rep(as.character(i + 1), length(freq))
  # your final data sample shows dividing by 5 on each one, 
  # but your pseudo code shows something to do with the columns
  mean_freq <- rowSums(myMat)/(ncol(myMat))  
  total_window_size <- rep((i+1)*pi*25, length(freq))
  density <- mean_freq/total_window_size

  myDf = data.frame(window_size, freq, mean_freq, total_window_size, density)

  myListOutput[[i]] <- myDf 

}

result_df = do.call(rbind, myListOutput)

【讨论】:

  • 谢谢。这成功地向我展示了如何将函数应用于矩阵列表。它省略了combined_cols 列,这对于此目的来说很好。但是,鉴于我的问题,有一个函数计算不正确,因此为了与 OP 保持一致,可能值得对您当前的帖子进行更正。 total_window_size 应该是 window_size(组合的列数)* pi*25 的乘积。如果我没看错的话,目前你将新构建的矩阵 *pi*25 中的总列数相乘。
  • 我觉得我调整了?
  • 我试过了,但如果你通过原始数据运行该循环,则会出现错误Error in window_size * pi : non-numeric argument to binary operator。我认为您需要索引哪个窗口大小
  • 按照上面的 (i+1) 怎么样。如果这对您有帮助,请考虑将答案标记为accepted
  • 感谢您的帮助。我已经接受了答案。
猜你喜欢
  • 2019-06-15
  • 1970-01-01
  • 1970-01-01
  • 2020-05-30
  • 1970-01-01
  • 2018-12-04
  • 1970-01-01
  • 2015-05-08
  • 1970-01-01
相关资源
最近更新 更多