【问题标题】:Mutate a column of models: "Error: Problem with `mutate()` input `model`. x Input `model` must be a vector, not a `lm` object."改变一列模型:“错误:`mutate()` 输入 `model` 出现问题。x 输入 `model` 必须是向量,而不是 `lm` 对象。”
【发布时间】:2020-09-02 01:23:04
【问题描述】:

我有一个数据框,其中包含模型公式定义作为列。我想改变一个新列,其中每一行都是基于相应行模型定义的模型。

一些数据:

# Set up
library(tidyverse)
library(lubridate)


# Create data
mydf <- data.frame(
  cohort = seq(ymd('2019-01-01'), ymd('2019-12-31'), by = '1 days'),
  n = rnorm(365, 1000, 50) %>% round,
  cohort_cost = rnorm(365, 800, 50)
) %>% 
  crossing(tenure_days = 0:365) %>% 
  mutate(activity_date = cohort + days(tenure_days)) %>% 
  mutate(daily_revenue = rnorm(nrow(.), 20, 1)) %>% 
  group_by(cohort) %>% 
  arrange(activity_date) %>% 
  mutate(cumulative_revenue = cumsum(daily_revenue)) %>% 
  arrange(cohort, activity_date) %>% 
  mutate(payback_velocity = round(cumulative_revenue / cohort_cost, 2)) %>% 
  select(cohort, n, cohort_cost, activity_date, tenure_days, everything())

## wider data
mydf_wide <- mydf %>% 
  select(cohort, n, cohort_cost, tenure_days, payback_velocity) %>% 
  group_by(cohort, n, cohort_cost) %>% 
  pivot_wider(names_from = tenure_days, values_from = payback_velocity, names_prefix = 'velocity_day_')

现在,最后一个问题代码块。它在最后一行失败:

models <- data.frame(
  from = mydf$tenure_days %>% unique,
  to = mydf$tenure_days %>% unique
) %>% 
  expand.grid %>% 
  filter(to > from) %>% 
  filter(from > 0) %>% 
  arrange(from) %>% 
  mutate(mod_formula = paste0('velocity_day_', to, ' ~ velocity_day_', from)) %>% 
  mutate(model = lm(as.formula(mod_formula), data = mydf_wide))

错误:mutate() 输入 model 有问题。 x 输入 model 必须是向量,而不是 lm 对象。 ℹ 输入modellm(as.formula(mod_formula), data = mydf_wide)

如果我运行最后一个代码块减去最后一行并查看生成的数据框“模型”,它看起来像这样:

models %>% head
  from to                     mod_formula
1    1  2 velocity_day_2 ~ velocity_day_1
2    1  3 velocity_day_3 ~ velocity_day_1
3    1  4 velocity_day_4 ~ velocity_day_1
4    1  5 velocity_day_5 ~ velocity_day_1
5    1  6 velocity_day_6 ~ velocity_day_1
6    1  7 velocity_day_7 ~ velocity_day_1

我尝试将其设为列表列,但据我所知,我需要分组。但在这种情况下,我需要按所有内容分组。我修改了最后一个代码块:

models <- data.frame(
  from = mydf$tenure_days %>% unique,
  to = mydf$tenure_days %>% unique
) %>% 
  expand.grid %>% 
  filter(to > from) %>% 
  filter(from > 0) %>% 
  arrange(from) %>% 
  mutate(mod_formula = paste0('velocity_day_', to, ' ~ velocity_day_', from)) %>% 
  group_by_all() %>% 
  nest() %>% 
  mutate(model = lm(as.formula(mod_formula), data = mydf_wide))

但是这会导致同样的错误。

如何根据字段“mod_formula”中的公式将新列添加到“模型”中,该列包含每行的线性模型?

【问题讨论】:

    标签: r


    【解决方案1】:

    lm 未矢量化。添加rowwise 为每一行创建一个模型。

    library(dplyr)
    
    models <- data.frame(
      from = mydf$tenure_days %>% unique,
      to = mydf$tenure_days %>% unique
    ) %>% 
      expand.grid %>% 
      filter(to > from) %>% 
      filter(from > 0) %>% 
      arrange(from) %>% 
      mutate(mod_formula = paste0('velocity_day_', to, ' ~ velocity_day_', from)) %>%
      rowwise() %>%
      mutate(model = list(lm(as.formula(mod_formula), data = mydf_wide)))
    
    models
    
    #  from    to mod_formula                     model 
    #  <int> <int> <chr>                           <list>
    #1     1     2 velocity_day_2 ~ velocity_day_1 <lm>  
    #2     1     3 velocity_day_3 ~ velocity_day_1 <lm>  
    #3     1     4 velocity_day_4 ~ velocity_day_1 <lm>  
    #4     1     5 velocity_day_5 ~ velocity_day_1 <lm>  
    #5     1     6 velocity_day_6 ~ velocity_day_1 <lm>  
    #6     1     7 velocity_day_7 ~ velocity_day_1 <lm>  
    #...
    #...
    

    您也可以使用map 代替rowwise

    mutate(model = purrr::map(mod_formula, ~lm(.x, data = mydf_wide))) 
    

    【讨论】:

    • 非常感谢!有趣的是,您不必使用“as.formula”,而是能够将模型定义用作字符串
    • 是的 lm 也适用于字符串。来自?lm - formula an object of class "formula" (or one that can be coerced to that class)。所以这里的字符串被强制转换为公式对象。
    猜你喜欢
    • 2021-08-02
    • 1970-01-01
    • 2021-03-22
    • 1970-01-01
    • 1970-01-01
    • 2020-12-29
    • 2022-01-08
    • 2018-12-18
    • 2019-02-25
    相关资源
    最近更新 更多