【问题标题】:Performing multiple two sample t-tests on two lists of data frames that have many columns对具有多列的两个数据框列表执行多个两个样本 t 检验
【发布时间】:2020-12-29 03:22:57
【问题描述】:

我有两个列表,每个列表有四个数据框。两个列表(“loc_list_future”和“loc_list_2019”)中的数据框有 33 列:“年份”,然后是 32 个不同气候模型的平均降水值。

loc_list_future 中的数据框看起来像这样,但总共有 32 个模型列,数据到 2059 年:

Year     Model 1    Model 2      Model 3   ...Model 32
2020    714.1101    686.5888    1048.4274       
2021   1018.0095    766.9161     514.2700      
2022    756.7066    902.2542     906.2877       
2023    906.9675    919.5234     647.6630       
2024    767.4008    861.1275     700.2612     
2025    876.1538    738.8370     664.3342       
2026    781.5092    801.2387     743.8965     
2027    876.3522    819.4323     675.3022       
2028    626.9468    927.0774     696.1884       
2029    752.4084    824.7682     835.1566  
...
2059   

loc_list_2019 中的数据框的年份从 2006 年到 2019 年不等,但其他看起来都一样。

每个数据框代表一个地理位置,两个列表具有相同的四个位置,但一个列表用于 2006-2019 年的值,另一个用于未来的值。

我想运行两个样本的 t 检验,将 2006-19 年的值与每个位置的每个模型的未来值进行比较。

我有另一个列表 (loc_list_OBS),它的数据帧只有两列“Year”和“Mean_Precip”(这是观察到的数据,不是基于模型的,这就是为什么只有一列用于平均 precip)。我有代码(见下文)将对观察数据(loc_list_OBS)针对未来数据(loc_list_future)运行两个样本t检验,但我不确定如何更改此代码以对两个列表运行t检验每个有 32 个模型。

myfun <- function(x,y)
{
  OBS_Data <- x$Mean_Precip
  #Empty list
  List <- list()
  #Now loop
  for(i in 2:dim(y)[2])
  {
    #Label
    val <- names(y[,i,drop=F])
    Future_Data <- y[,i]
    #Test
    test <- t.test(OBS_Data, Future_Data, alternative = "two.sided") 
    #Save
    List[[i-1]] <- test
    names(List)[i-1] <- val
  }
  return(List)
}

t.stat <- mapply(FUN = myfun,x=loc_list_OBS,y=loc_list_future, SIMPLIFY = FALSE) 

【问题讨论】:

    标签: r loops statistics t-test


    【解决方案1】:

    我会建议下一个方法。我创建了类似于您所拥有的虚拟数据。代码如下:

    #Data before
    dfb <- structure(list(Year = 2010:2019, Model.1 = c(614.1101, 918.0095, 
    656.7066, 806.9675, 667.4008, 776.1538, 681.5092, 776.3522, 526.9468, 
    652.4084), Model.2 = c(586.5888, 666.9161, 802.2542, 819.5234, 
    761.1275, 638.837, 701.2387, 719.4323, 827.0774, 724.7682), Model.3 = c(948.4274, 
    414.27, 806.2877, 547.663, 600.2612, 564.3342, 643.8965, 575.3022, 
    596.1884, 735.1566)), class = "data.frame", row.names = c(NA, 
    -10L))
    #Data after
    dfa <- structure(list(Year = 2020:2029, Model.1 = c(714.1101, 1018.0095, 
    756.7066, 906.9675, 767.4008, 876.1538, 781.5092, 876.3522, 626.9468, 
    752.4084), Model.2 = c(686.5888, 766.9161, 902.2542, 919.5234, 
    861.1275, 738.837, 801.2387, 819.4323, 927.0774, 824.7682), Model.3 = c(1048.4274, 
    514.27, 906.2877, 647.663, 700.2612, 664.3342, 743.8965, 675.3022, 
    696.1884, 835.1566)), class = "data.frame", row.names = c(NA, 
    -10L))
    

    现在是代码:

    #Data for lists
    L.before <- list(df1=dfb,df2=dfb,df3=dfb,df4=dfb)
    L.after <- list(df1=dfa,df2=dfa,df3=dfa,df4=dfa)
    

    功能:

    #Function
    myfun <- function(x,y)
    {
      #Create empty list
      List <- list()
      #Loop
      for(i in 2:dim(x)[2])
      {
        name <- names(x[,i,drop=F])
        before <- x[,i]
        after <- y[,i]
        #Test
        test <- t.test(before, after, alternative = "two.sided") 
        #Save
        List[[i-1]] <- test
        names(List)[i-1] <- name
      }
      return(List)
    }
    

    应用程序:

    #Apply
    t.stat <- mapply(FUN = myfun,x=L.before,y=L.after, SIMPLIFY = FALSE)
    

    一些输出:

    t.stat[[1]]
    
    $Model.1
    
        Welch Two Sample t-test
    
    data:  before and after
    t = -1.9966, df = 18, p-value = 0.06122
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -205.224021    5.224021
    sample estimates:
    mean of x mean of y 
     707.6565  807.6565 
    
    
    $Model.2
    
        Welch Two Sample t-test
    
    data:  before and after
    t = -2.8054, df = 18, p-value = 0.0117
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -174.88934  -25.11066
    sample estimates:
    mean of x mean of y 
     724.7764  824.7764 
    
    
    $Model.3
    
        Welch Two Sample t-test
    
    data:  before and after
    t = -1.4829, df = 18, p-value = 0.1554
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
     -241.67613   41.67613
    sample estimates:
    mean of x mean of y 
     643.1787  743.1787 
    

    让我知道这是否适合你!

    【讨论】:

    • 是的,效果很好!再次感谢鸭!非常感谢您的帮助。
    猜你喜欢
    • 1970-01-01
    • 2022-01-09
    • 1970-01-01
    • 2021-06-09
    • 2022-01-16
    • 1970-01-01
    • 2021-01-29
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多