【问题标题】:Error creating samples with mapply使用 mapply 创建样本时出错
【发布时间】:2016-10-21 03:54:33
【问题描述】:

我有一个这样的数据框:

df <- data.frame(size_upms = c(126, 123, 148),
             electric_mean = c(0.716756756756757,0.647859922178988, 0.726313694267516),
             gas_mean = c(0.273513513513513,0.322679266259033, 0.259554140127389),
             firewood_mean = c(0, 0.00111172873818788,0.00179140127388535))

# df
#  size_upms electric_mean  gas_mean firewood_mean
#1       126     0.7167568 0.2735135   0.000000000
#2       123     0.6478599 0.3226793   0.001111729
#3       148     0.7263137 0.2595541   0.001791401

我想使用 mapply 的每一行的参数获取样本

l <- mapply(sample,c("electric","gas","firewood"),df$size_upms,TRUE,
            c(df$electric_mean,df$gas_mean,df$firewood_mean))

但我收到此错误:

#Error in sample.int(length(x), size, replace, prob) : 
#  too few positive probabilities

但是,如果我将示例函数应用于每一行,它就会起作用:

sample(c("electric","gas","firewood"),df$size_upms[1],TRUE,
   c(df$electric_mean[1],df$gas_mean[1],df$firewood_mean[1]))[1:5]
#[1] "gas"      "electric" "electric" "gas"      "electric"
sample(c("electric","gas","firewood"),df$size_upms[2],TRUE,
   c(df$electric_mean[2],df$gas_mean[1],df$firewood_mean[2]))[1:5]
#[1] "electric" "gas"      "gas"      "gas"      "electric"
sample(c("electric","gas","firewood"),df$size_upms[3],TRUE,
   c(df$electric_mean[3],df$gas_mean[3],df$firewood_mean[1]))[1:5]
#[1] "electric" "electric" "gas"      "electric" "electric"

但我想使用 mapply,因为我想将它应用到大数据框

我做错了什么?

【问题讨论】:

    标签: r mapply


    【解决方案1】:

    因为它是按行排列的,所以使用applylapply 更容易。 mapply 或其他 apply 解决方案在性能上不会有太大差异

    lapply(seq_len(nrow(df)), function(i) 
        sample(c("electric","gas","firewood"), df$size_upms[i], TRUE, 
        unlist(c(df$electric_mean[i],df$gas_mean[i],df$firewood_mean[i]))))
    

    OP的解决方案中的错误是连接过程。在这里,我们将参数作为数据集中的单独列传递,然后在匿名函数调用中进行连接。这将确保对于每一步,从列中选择相应的行元素。

    Map(function(x,y, u, w) sample(c("electric","gas","firewood"), x, 
         TRUE, c(y, u, w)), df$size_upms, df$electric_mean, df$gas_mean, df$firewood_mean)
    

    或者正如@thelatemail 评论的那样,我们可以通过使用do.call 来避免一些打字

    do.call(Map, c( function(x,y, u, w) 
        sample(c("electric","gas","firewood"), x, TRUE, c(y,u,w)), unname(df)))
    

    【讨论】:

    • 我认为Map 解决方案是最简洁的。
    • 您甚至可以节省一些输入,例如 do.call(Map, c( function(x,y, u, w) sample(c("electric","gas","firewood"), x, TRUE, c(y,u,w)), unname(df) ))
    • 太棒了,谢谢@akrun 的详细解释
    • @Israel,匿名函数调用参数对应于 Map 中的每个输入参数。即“x”将是“df$size_upms”
    • 非常感谢@akrun
    猜你喜欢
    • 1970-01-01
    • 2015-09-12
    • 1970-01-01
    • 2021-03-14
    • 1970-01-01
    • 2022-01-08
    • 2013-09-03
    • 1970-01-01
    • 2016-07-07
    相关资源
    最近更新 更多