【问题标题】:mapply() unsuccessful for columnwise t.test of two dataframes (R)mapply() 对两个数据帧 (R) 的列 t.test 不成功
【发布时间】:2014-12-03 20:05:39
【问题描述】:

我有两个数据框,我想对匹配的列进行 t.test。两个数据帧都是大数据帧的子集,因此所有列名都相同且匹配(ncol= ~20000)并且 nrow(df1)=25 和 nrow(df2)=23。

例子:

treatment<-matrix(rnorm(50), ncol=10)
control<-matrix(rnorm(50), ncol=10)

treatment
            [,1]        [,2]       [,3]       [,4]       [,5]       [,6]
[1,]  0.23442246  1.02256703  1.0499998  0.2913643 -1.2083822  0.3778403
[2,] -0.68888047 -0.03961717 -0.9978793 -0.9792061 -0.1831634  0.6140542
[3,] -1.88273887 -0.49701513  0.1845197  0.4385338  1.2249121  0.5444027
[4,]  1.21359446  0.87333933  0.5615304  0.3803339  1.1294489 -0.8777454
[5,] -0.02908159 -1.50296138  0.4624656  0.1335046  1.1665818 -0.4475185
          [,7]      [,8]       [,9]      [,10]
[1,] 0.5987723 0.5910937  0.4334874 -1.4198250
[2,] 0.2027346 0.8078187 -1.0573069  1.0727554
[3,] 0.5490159 0.5109912  1.7247428  1.7745333
[4,] 0.3044544 0.6476548  1.1959365 -0.1220841
[5,] 1.8681375 0.8451147  0.4283893  0.1044125

control
          [,1]       [,2]       [,3]        [,4]        [,5]        [,6]
[1,]  0.6712834 -0.3775649  0.7741285  0.51224345  0.24128336  1.02580198
[2,]  0.3894112 -0.1835289  0.4982122  1.73512459  0.08991013 -0.04406897
[3,]  1.7068503  0.7909355 -0.3341426  0.08780239 -1.11563321  2.09984105
[4,] -0.7634818 -1.3672888  0.2161816 -0.65170516  0.81247509  1.68008404
[5,]  0.5787616  0.1704100 -0.3166737  0.90167409 -2.34854292  0.31571255
           [,7]       [,8]       [,9]      [,10]
[1,] -1.6111883  0.1019497 -0.1975491 -0.3776000
[2,]  0.7533329  1.1540590  1.0050663  2.0137347
[3,]  1.2224161  1.4411853 -0.4801494 -0.3891034
[4,]  0.1905461  0.9767801 -0.1442578 -0.9946735
[5,] -1.9581454 -0.2874181 -1.0421440 -0.6177782

我在 SO 上进行了一些搜索,发现了 mapply():

mapply(t.test,treatment,control)
Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) :
  not enough 'x' observations

但是当我对单列进行 t.test 时:

t.test(treatment[,1],control[,1])

  Welch Two Sample t-test
data:  treatment[, 1] and control[, 1]
t = -1.1541, df = 7.492, p-value = 0.284
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.2577187  0.7635152
sample estimates:
mean of x  mean of y
-0.2305368  0.5165649

这里有什么问题?

【问题讨论】:

    标签: r


    【解决方案1】:

    treatmentcontrol,作为matrix 对象,本质上是vector(如c(1,2,3)),因此mapply 尝试运行t.test 比较每个单独的数字。例如:

    treatment[1]
    #[1] 0.7545039
    control[1]
    #[1] -0.3926361
    
    t.test(treatment[1],control[1])
    #Error in t.test.default(dots[[1L]][[1L]], dots[[2L]][[1L]]) : 
    #  not enough 'x' observations
    

    如果您将矩阵转换为 data.frame 对象,每列将被视为一个对象,mapply 将正常工作:

    mapply(t.test,as.data.frame(treatment),as.data.frame(control))
    
    #            V1                                     
    #statistic   -0.7829546                             
    #parameter   7.698139                               
    #p.value     0.4570611                              
    #etc etc 
    

    在这种情况下,我几乎可以肯定使用Map 更适合可读性:

    Map(t.test,as.data.frame(treatment),as.data.frame(control))
    
    #$V1
    #
    #        Welch Two Sample t-test
    #
    #data:  dots[[1L]][[1L]] and dots[[2L]][[1L]]
    #t = -0.783, df = 7.698, p-value = 0.4571
    #alternative hypothesis: true difference in means is not equal to 0
    #95 percent confidence interval:
    # -1.525349  0.756036
    #sample estimates:
    #  mean of x   mean of y 
    #-0.31246928  0.07218723 
    

    【讨论】:

      猜你喜欢
      • 2013-04-07
      • 2021-05-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-04-02
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多