【问题标题】:How to use R and dplyr to paste column name as value for lookup如何使用 R 和 dplyr 将列名粘贴为查找值
【发布时间】:2020-02-18 04:13:25
【问题描述】:

我正在使用 glm 回归模型的输出系数,我需要创建一个查找值,使用键粘贴 ([列名].[因子级别],然后从另一个数据表返回相应的值。列名称必须是动态的,这样我就不必一一明确地命名每一列。 然后将查找返回的值乘以 1(对于因子)或实际数值,并将所有 coef_colnames 相加到 Total 列中。

我在 excel 中做了一些示例,但无法在 R 中复制它。 var_Factor1 结合每行的列名和因子级别(使用粘贴)来构建下一步查找的键

var_Number1 只是列名,因为它是数字并且没有因子级别

library(dplyr)

# original data
dt = data.table(
  Factor1  = c("A","B","C"),
  Number1 = c(10, 20,40),
  Factor2 = c("D","H","N"),
  Number2 = c(2, 5,3)
)

# Lookup table
model_coef = data.table(
    Factor1.A   = 10,
    Factor1.B   = 20,
    Factor1.C   = 30,
    Factor2.D   = 40,
    Factor2.H   = 50,
    Factor2.N   = 60,
    Number1 = 200,
    Number2 = 500
)

#initial steps
dt <- dt %>% mutate (
  var_Factor1 = paste("Factor1", Factor1, sep =".")
, var_Number1 = "Number1"
, var_Factor2 = paste("Factor2", Factor2, sep =".")
, var_Number2 = "Number2"
) %>% mutate (
    coef_Factor1 = model_coef[,var_Factor1]
)

#The final output should produce (as replicated from Excel)


final_output = data.table (
  Factor1= c("A", "B", "C"),
  Number1= c(10, 20, 40),
  Factor2= c("D", "H", "N"),
  Number2= c(2, 5, 3),
  var_Factor1= c("Factor1.A", "Factor1.B", "Factor1.C"),
  var_Number1= c("Number1", "Number1", "Number1"),
  var_Factor2= c("Factor2.D", "Factor2.H", "Factor2.N"),
  var_Number2= c("Number2", "Number2", "Number2"),
  coef_Factor1= c(10, 20, 30),
  coef_Number1= c(200, 200, 200),
  coef_Factor2= c(40, 50, 60),
  coef_Number2= c(500, 500, 500),
  calc_Factor1= c(10, 20, 30),
  calc_Number1= c(2000, 4000, 8000),
  calc_Factor2= c(40, 50, 60),
  calc_Number2= c(1000, 2500, 1500),
  Total= c(3050, 6570, 9590)
)

【问题讨论】:

  • 我的解决方案有帮助吗?

标签: r lookup tidyr dplyr


【解决方案1】:

尝试生成和操作动态列通常是个坏主意。 使用整洁的数据约定并使数据“长”可能会更好。此外,您似乎正在尝试混合使用 data.table 和 dplyr/tidyverse。特别是,这不起作用:mutate (coef_Factor1 = model_coef[,var_Factor1]

我已经整理了您的数据并修改了您的代码以使用下面的 dplyr/tidyverse:

  • 使用 tibble 代替 data.table
  • 将查找表重新构建为整齐的格式,以便它可以左连接 正确地放在你的桌子上
  • 使用 mutate 进行您描述的计算

除了您的示例之外,如果您有超过 2 个“数字”/“因子”(顺便说一句,您的命名/标签/编号令人困惑),还有一些方法可以进一步概括,以便代码一般将 coef * 数字相乘,对于每个“数字”/组合。此外,您的数据暗示但不清楚 A 与 D 相关,B 与 H 相关,等等。

library(tidyverse)

data <- tibble(Factor1  = c("A","B","C"),Number1 = c(10, 20,40),Factor2 = c("D","H","N"),Number2 = c(2, 5,3))
model_coef <- tibble(Factor1.A   = 10,Factor1.B   = 20,Factor1.C   = 30,Factor2.D   = 40,Factor2.H   = 50,Factor2.N   = 60,Number1 = 200,Number2 = 500)

(model_coef_factor1 <- model_coef %>%
    select(Factor1.A:Factor1.C) %>%
    pivot_longer(cols = everything(), names_to = c("number", "factor"), names_sep = "[.]", values_to = "coef_factor1") %>%
    select(-number))
#> # A tibble: 3 x 2
#>   factor coef_factor1
#>   <chr>         <dbl>
#> 1 A                10
#> 2 B                20
#> 3 C                30

(model_coef_factor2 <- model_coef %>%
    select(Factor2.D:Factor2.N) %>%
    pivot_longer(cols = everything(), names_to = c("number", "factor"), names_sep = "[.]", values_to = "coef_factor2") %>%
    select(-number))
#> # A tibble: 3 x 2
#>   factor coef_factor2
#>   <chr>         <dbl>
#> 1 D                40
#> 2 H                50
#> 3 N                60

(final_output <- data %>%
    left_join(model_coef_factor1, by = c("Factor1"="factor")) %>%
    left_join(model_coef_factor2, by = c("Factor2"="factor")) %>%
    mutate(coef_number1 = model_coef$Number1,
           coef_number2 = model_coef$Number2,
           calc_factor1 = coef_factor1,
           calc_number1 = Number1 * coef_number1,
           calc_factor2 = coef_factor2,
           calc_number2 = Number2 * coef_number2,
           total = calc_factor1 + calc_number1 + calc_factor2 + calc_number2) %>%
    select(total, everything()))
#> # A tibble: 3 x 13
#>   total Factor1 Number1 Factor2 Number2 coef_factor1 coef_factor2
#>   <dbl> <chr>     <dbl> <chr>     <dbl>        <dbl>        <dbl>
#> 1  3050 A            10 D             2           10           40
#> 2  6570 B            20 H             5           20           50
#> 3  9590 C            40 N             3           30           60
#> # ... with 6 more variables: coef_number1 <dbl>, coef_number2 <dbl>,
#> #   calc_factor1 <dbl>, calc_number1 <dbl>, calc_factor2 <dbl>,
#> #   calc_number2 <dbl>

reprex package (v0.3.0) 于 2019 年 10 月 23 日创建

【讨论】:

    猜你喜欢
    • 2016-05-22
    • 2017-03-01
    • 2021-08-08
    • 2018-07-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-11-11
    • 1970-01-01
    相关资源
    最近更新 更多