【问题标题】:create data frame containing rows that add up to 100创建包含总计 100 行的数据框
【发布时间】:2019-01-17 17:39:07
【问题描述】:

这是我第一次尝试:

library(dplyr)

step_size <- 5

grid <- expand.grid(
    x1 = seq(0, 100, step_size)
    , x2 = seq(0, 100, step_size)
    , x3 = seq(0, 100, step_size)
)

grid$sum = grid$x1 + grid$x2 + grid$x3
grid$x1 <- (grid$x1 / grid$sum) * 100
grid$x2 <- (grid$x2 / grid$sum) * 100
grid$x3 <- (grid$x3 / grid$sum) * 100
grid$sum <- grid$x1 + grid$x2 + grid$x3

nrow(grid)

result <- distinct(grid) %>% filter(!is.na(sum))

head(result, 20)
nrow(result)

基本上,我想创建一个数据框,其中包含尽可能多的行,加起来为 100 并且均匀分布。

在 R 中有更简单更好的方法吗?谢谢!

【问题讨论】:

    标签: r


    【解决方案1】:

    使用data.table...

    library(data.table)
    
    grid <- expand.grid(
      x1 = seq(0, 100)
      , x2 = seq(0, 100)
      , x3 = seq(0, 100)
    )
    
    setDT(grid)
    
    res <- grid[grid[, rowSums(.SD) == 100], ]
    res[, summation := rowSums(.SD)]
    

    结果:

    > res[, unique(summation)]
    [1] 100
    

    这也可以在base 中完成,但data.table 更快:

    library(data.table)
    
    grid <- expand.grid(
      x1 = seq(0, 100)
      , x2 = seq(0, 100)
      , x3 = seq(0, 100)
    )
    
    
    grid2 <- expand.grid(
      x1 = seq(0, 100)
      , x2 = seq(0, 100)
      , x3 = seq(0, 100)
    )
    
    setDT(grid)
    
    microbenchmark::microbenchmark(
      data.table = {        
        res <- grid[grid[, rowSums(.SD) == 100], ]
      },
      base = {
        res2 <- grid2[rowSums(grid2) == 100, ]
      }
    )
    
    Unit: milliseconds
           expr      min       lq     mean   median       uq      max neval cld
     data.table 59.41157  89.6700 109.0462 107.7415 124.2675 183.9730   100  a 
           base 65.70521 109.6471 154.1312 125.4238 156.9168 611.0169   100   b
    

    【讨论】:

      【解决方案2】:

      这是一个简单的函数。您可以指定所需的行数/列数,以及每行的总和。

      func <- function(cols = 3, rows = 10, rowTotal = 100) {
        dt1 <- replicate(n = cols, runif(n = rows))
        dt1 <- data.frame(apply(X = dt1, MARGIN = 2, FUN = function(x) x / rowSums(dt1) * rowTotal))
        return(dt1)
      }
      
      rowSums(func()) # default values (3 cols, 10 rows, each row sums to 100) 
      rowSums(func(cols = 5, rows = 10, rowTotal = 50)) # 5 cols, 10 rows, row sums to 50)
      

      【讨论】:

      猜你喜欢
      • 2019-07-22
      • 2021-10-07
      • 2019-10-11
      • 1970-01-01
      • 2021-09-15
      • 2019-04-18
      • 1970-01-01
      • 2023-01-04
      • 1970-01-01
      相关资源
      最近更新 更多