【问题标题】:R: Which is the tidy way to apply a function over various columns of each row of a data frame?R:在数据框每一行的不同列上应用函数的简洁方法是什么?
【发布时间】:2018-09-08 08:51:07
【问题描述】:

我想对数据框的所有行应用一个函数,其中每个应用列作为不同的输入(不像mean,而是作为参数)。

我想知道以下整洁的方法是什么:

# Data
successes <- c(0,3,6,15,15,17,12,9,22,33)
trials <- c(50,1788,1876,3345,1223,856,342,214,265,257)
prognosis <- 0.01*c(0.05,0.10,0.25,0.5,0.75,1.3,2,3.4,6,10)
test_data = data.frame(successes = successes, trials = trials, 
                       prognosis = prognosis, p_value1 = NA, p_value2 = NA)

for(i in 1: nrow(test_data)){
  test_data$p_value1[i] = binom.test(test_data$successes[i], test_data$trials[i],
                                    test_data$prognosis[i], "less")$p.value
  test_data$p_value2[i] = binom.test(test_data$successes[i], test_data$trials[i],
                                     test_data$prognosis[i], "greater")$p.value
}

【问题讨论】:

    标签: r apply tidyr tibble


    【解决方案1】:

    一种可能的方法是:

    successes <- c(0,3,6,15,15,17,12,9,22,33)
    trials <- c(50,1788,1876,3345,1223,856,342,214,265,257)
    prognosis <- 0.01*c(0.05,0.10,0.25,0.5,0.75,1.3,2,3.4,6,10)
    test_data = data.frame(successes = successes, trials = trials, 
                           prognosis = prognosis, p_value1 = NA, p_value2 = NA)
    
    
    library(dplyr)
    
    test_data %>%
      rowwise() %>%
      mutate(p_value1 = binom.test(successes, trials, prognosis, "less")$p.value,
             p_value2 = binom.test(successes, trials, prognosis, "greater")$p.value) %>%
      ungroup()
    
    # # A tibble: 10 x 5
    #     successes trials prognosis p_value1 p_value2
    #      <dbl>  <dbl>     <dbl>    <dbl>    <dbl>
    # 1        0.    50.  0.000500    0.975   1.00  
    # 2        3.  1788.  0.00100     0.893   0.266 
    # 3        6.  1876.  0.00250     0.806   0.330 
    # 4       15.  3345.  0.00500     0.396   0.697 
    # 5       15.  1223.  0.00750     0.975   0.0467
    # 6       17.   856.  0.0130      0.966   0.0595
    # 7       12.   342.  0.0200      0.978   0.0447
    # 8        9.   214.  0.0340      0.805   0.306 
    # 9       22.   265.  0.0600      0.950   0.0786
    # 10       33.   257.  0.100       0.943   0.0822
    

    或者使用没有rowwise的向量化函数:

    # create function and vectorise it
    GetPvalue = function(s, t, p, alt) {binom.test(s, t, p, alt)$p.value}
    GetPvalue = Vectorize(GetPvalue)
    
    test_data %>%
      mutate(p_value1 = GetPvalue(successes, trials, prognosis, "less"),
             p_value2 = GetPvalue(successes, trials, prognosis, "greater"))
    
    #    successes trials prognosis  p_value1   p_value2
    # 1          0     50    0.0005 0.9753038 1.00000000
    # 2          3   1788    0.0010 0.8933086 0.26613930
    # 3          6   1876    0.0025 0.8061877 0.32975624
    # 4         15   3345    0.0050 0.3963610 0.69722243
    # 5         15   1223    0.0075 0.9748903 0.04667939
    # 6         17    856    0.0130 0.9656352 0.05952219
    # 7         12    342    0.0200 0.9781863 0.04473155
    # 8          9    214    0.0340 0.8047247 0.30581962
    # 9         22    265    0.0600 0.9503332 0.07855963
    # 10        33    257    0.1000 0.9433326 0.08219425
    

    【讨论】:

    • 太棒了!谢谢!
    猜你喜欢
    • 1970-01-01
    • 2011-04-08
    • 1970-01-01
    • 2016-09-05
    • 1970-01-01
    • 2018-04-19
    • 1970-01-01
    • 2016-12-25
    相关资源
    最近更新 更多