【问题标题】:Mutate factor to new variable in case_when() statement在 case_when() 语句中将因子变异为新变量
【发布时间】:2021-11-26 18:00:54
【问题描述】:

数据设置

我有一个看起来有点像下面这个简单数据框的数据集:

CAD_EXCHANGE <- 1.34
EUR_EXCHANGE <- 0.88
 
df <- tibble(
  shipment = c("A", "B", "C", "D", "E"),
  invoice = c(rep(500, 5)),
  currency = factor(c("USD", "EUR", "CAD", NA, "SDD"))
)
 
df
# A tibble: 5 x 3
  shipment invoice currency
  <chr>      <dbl> <fct>   
1 A            500 USD     
2 B            500 EUR     
3 C            500 CAD     
4 D            500 NA      
5 E            500 SDD     

levels(df$currency)
[1] "CAD" "EUR" "SDD" "USD"

最终目标

我正在尝试将某些常见其他货币(欧元和加元)的发票转换为美元,但不是全部或数据丢失(即 SDD 和NA)。我的最终数据框应如下所示:

# A tibble: 5 x 5
  shipment invoice currency invoice_converted currency_converted
  <chr>      <dbl> <fct>                <dbl> <fct>             
1 A            500 USD                    500 USD               
2 B            500 EUR                    568 USD               
3 C            500 CAD                    373 USD               
4 D            500 NA                     500 NA                
5 E            500 SDD                    500 SDD               

试用 1 -- 不起作用

将来,我可能要转换的不仅仅是这几种货币,所以我应用了case_when() 声明。这是我的第一次尝试:

df_USD1 <- df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ),
    currency_converted = case_when(currency == "EUR" ~ "USD",
                                   currency == "CAD" ~ "USD",
                                   TRUE ~ currency)
  )

Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = case_when(...)`.
x must be a character vector, not a `factor` object.

通过以上内容,我知道我在分配给currency_converted 时混合了字符和因素,因为我有默认的TRUE ~ currency(而currency 是一个因素)。所以我尝试只使用因子来分配......

试用 2 -- 有效,但不可靠

df_USD2 <- df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ),
    currency_converted = case_when(
      currency == "EUR" ~ currency[1],
      currency == "CAD" ~ currency[1],
      TRUE ~ currency)
  )

它有效,但只是因为在我对这个问题的设置中,美元处于第一位,我不能依赖它。

> df$currency
[1] USD  EUR  CAD  <NA> SDD 
Levels: CAD EUR SDD USD

试用 3 -- 不起作用

我想我可以尝试一些其他方法来获得子集的因素,但这不起作用:

df_USD3 <- df %>%
  mutate(
    invoice_converted = case_when(
      currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
      currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
      TRUE ~ invoice
    ),
    currency_converted = case_when(
      currency == "EUR" ~ df$currency[df$currency == "USD"],
      currency == "CAD" ~ df$currency[df$currency == "USD"],
      TRUE ~ currency
    )
  )

Error: Problem with `mutate()` column `currency_converted`.
i `currency_converted = factor(...)`.
x `currency == "EUR" ~ df$currency[df$currency == "USD"]`, `currency == "CAD" ~ df$currency[df$currency == "USD"]` must be length 5 or one, not 2.
Run `rlang::last_error()` to see where the error occurred.

而且似乎是因为 NA 被返回...

> df$currency[df$currency == "USD"]
[1] USD  <NA>
Levels: CAD EUR SDD USD

...因为如果我回到原来的df 并用其他货币替换NA,它会起作用——但显然我需要能够将NA 保留在它所属的位置。

我觉得有一些非常好的方法可以做到这一点,但是尽管阅读了因素并尝试了不同的东西,但我还是错过了它。帮忙?

【问题讨论】:

    标签: r case r-factor


    【解决方案1】:

    case_when 不会自动进行类型转换——即currencyfactorcase_when 中其他条件的返回只是character。因此,我们可以强制将 currency 转换为 character 以使所有返回相同的类,它应该可以工作

    library(dplyr)
    df %>%
      mutate(
        invoice_converted = case_when(
          currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
          currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
          TRUE ~ invoice
        ), currency_converted = case_when(currency == "EUR" ~ "USD",
                                       currency == "CAD" ~ "USD",
                                       TRUE ~ as.character(currency)))
    

    -输出

    # A tibble: 5 × 5
      shipment invoice currency invoice_converted currency_converted
      <chr>      <dbl> <fct>                <dbl> <chr>             
    1 A            500 USD                    500 USD               
    2 B            500 EUR                    568 USD               
    3 C            500 CAD                    373 USD               
    4 D            500 <NA>                   500 <NA>              
    5 E            500 SDD                    500 SDD             
    

    如果我们想将其保留为factor,要么在case_when 之后用factor 包装,要么直接使用fct_recode 而不是case_when

    library(forcats)
    df %>%
      mutate(
        invoice_converted = case_when(
          currency == "EUR" ~ round(invoice / EUR_EXCHANGE),
          currency == "CAD" ~ round(invoice / CAD_EXCHANGE),
          TRUE ~ invoice
        ), currency_converted = fct_recode(currency, USD = "EUR", USD = "CAD"))
    

    -输出

    # A tibble: 5 × 5
      shipment invoice currency invoice_converted currency_converted
      <chr>      <dbl> <fct>                <dbl> <fct>             
    1 A            500 USD                    500 USD               
    2 B            500 EUR                    568 USD               
    3 C            500 CAD                    373 USD               
    4 D            500 <NA>                   500 <NA>              
    5 E            500 SDD                    500 SDD          
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2016-12-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多