数据框列表的统计信息答案

【问题标题】：statistics on a list of data frames数据框列表的统计信息
【发布时间】：2020-02-27 02:46:55
【问题描述】：

我有一个包含两个数据框的列表，d$1 用于 ctrl 患者，d$2 用于生病患者。每个df包含来自3名患者的微生物相对丰度：

List of 2
 $ CTRL  :'data.frame': 3 obs. of  18107 variables:
  ..$ Azorhizobium caulinodans                                           : num [1:3] 1.48e-07 1.62e-06 1.05e-06
  ..$ Buchnera aphidicola                                                : num [1:3] 9.63e-07 1.01e-06 8.09e-07
  ..$ Cellulomonas gilvus                                                : num [1:3] 1.63e-06 5.39e-07 4.05e-07
  ..$ Dictyoglomus thermophilum                                          : num [1:3] 2.30e-06 3.17e-06 1.34e-06
  ..$ Pelobacter carbinolicus                                            : num [1:3] 9.63e-07 3.70e-06 1.38e-06
  ..$ Shewanella colwelliana                                             : num [1:3] 9.63e-07 1.89e-06 1.62e-07
  ..$ Myxococcus fulvus                                                  : num [1:3] 1.78e-06 4.65e-06 1.50e-06
$ SICK:'data.frame':    3 obs. of  18107 variables:
  ..$ Azorhizobium caulinodans                                           : num [1:3] 4.24e-07 0.00 1.28e-06
  ..$ Buchnera aphidicola                                                : num [1:3] 5.45e-07 6.02e-07 4.47e-07
  ..$ Cellulomonas gilvus                                                : num [1:3] 3.03e-07 0.00 2.23e-07
  ..$ Dictyoglomus thermophilum                                          : num [1:3] 6.66e-07 2.75e-06 1.96e-06
  ..$ Pelobacter carbinolicus                                            : num [1:3] 9.69e-07 1.72e-07 1.62e-06
  ..$ Shewanella colwelliana                                             : num [1:3] 1.76e-06 6.02e-07 3.91e-07
  ..$ Myxococcus fulvus                                                  : num [1:3] 6.66e-07 8.60e-07 1.56e-06

我想为每个分类单元（CTRL 与 SICK）计算一些统计数据，并将每个错误的结果保存为单独的 df（results.mw）。我试过了：

results.mw = lapply(mylist, function(d, l)
  {
  # Run wilcoxon by column
    as.data.frame(wilcox.test(d, l, exact = F)$p.value)
  }, d$"CTRL", l$"SICK")

但我遇到了一个错误

Error in FUN(X[[i]], ...) : unused argument (l$SICK)

【问题讨论】：

嘿，丹尼尔，你打错了级别。我可以为您更正，但您确定要使用 n=3 进行 wilcoxon 吗？
不，我不是 :) 但这是一个更普遍的问题。这个数据集只是大型研究的开始，现在我每组只有 n = 3（实际上我有 3 对双胞胎，每对生病，另一对健康）。我不确定我应该使用什么测试，但现在我正在考虑解决这个技术问题。任何建议都非常受欢迎。
好的，我明白了。你是怎么得到丰盛的？从16S测序？我可以写“技术方案”给你探索
不，它是从 WGS 中提取的物种级别

标签： list dataframe statistics

【解决方案1】：

您需要遍历分类单元而不是包含两个数据框的原始列表。下面我稍微编辑了代码，它应该执行成对测试。我模拟了数据，使其具有与您所拥有的相似的东西..

# create data function
makeData = function(){
df = data.frame(matrix(rnorm(1000*3),3,1000))
colnames(df) = paste("S",1:1000,sep="_")
rownames(df) = letters[1:3]
return(df)
}
# create two data.frames
mylist = list(
       CTRL=makeData(),SICK=makeData()
)
# check 
str(mylist)
# although you said species are the same
# just to be sure
# we take intersection of species names
SPECIES = intersect(names(mylist$CTRL),names(mylist$CTRL))
# loop through species
p = sapply(SPECIES, function(i)
  {
  # Run wilcoxon by species
    wilcox.test(mylist$CTRL[,i],mylist$SICK[,i],exact=F)$p.value
  })
# gives you p-value by species
head(as.data.frame(p))

【讨论】：

它有效。太感谢了！非常感谢您的时间。