【问题标题】:How to do mathematical manipulation work on extracted numbers from a string如何对从字符串中提取的数字进行数学操作
【发布时间】:2020-11-20 19:34:39
【问题描述】:

我有一个数据集如下图:

structure(list(Commission20 = c("3.3% AND 1.2%", "3.2% 1S $100000 1.1% BALANCE", 
"3.2% AND 1.0% AND 1.1% AND $1000 SELLING BONUS", "3.3% AND 1.2%", 
"3.3% AND 1.2%", "3.0% AND 1.0% BALANCE", "3.2% 1S $100000 1.1% BALANCE", 
"3.2% AND 1.2%", "3.2% AND 1.2%", "3.2% 1ST 1OOK AND 1.1% BALANCE", 
"3.2% AND 1.1%", "3.0% 1ST $100000", "3.0% 1ST $100000", "3.2% 1ST $100000", 
"3.0% 1ST $100000", "3.0% 1ST $100000", "3.0% 1ST $100000", "3.0% 1ST $100000", 
"3.2% 1ST $100000 AND $5000"), First = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), `cut-off` = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), Second = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), Bonus = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_), Fixed = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_)), row.names = c(NA, -19L), class = c("tbl_df", 
"tbl", "data.frame"))

可以看出,有些数据是3.0% 1ST $100000的形式(我现在只对这个形式感兴趣)。显然,这个数字只是 3000 美元,所以我想计算 3000 并将其替换为 fixed 列中的 NA 值。因此,我不仅应该提取 3% 和 100000 美元,而且还必须将它们相乘并计算 3000 并将其替换到正确的列中。那么预期的结果是这样的:

   Commission20                                   First `cut-off` Second Bonus Fixed
   <chr>                                          <chr> <chr>     <chr>  <chr> <chr>
 1 3.3% AND 1.2%                                  NA    NA        NA     NA    NA   
 2 3.2% 1S $100000 1.1% BALANCE                   NA    NA        NA     NA    NA   
 3 3.2% AND 1.0% AND 1.1% AND $1000 SELLING BONUS NA    NA        NA     NA    NA   
 4 3.3% AND 1.2%                                  NA    NA        NA     NA    NA   
 5 3.3% AND 1.2%                                  NA    NA        NA     NA    NA   
 6 3.0% AND 1.0% BALANCE                          NA    NA        NA     NA    NA   
 7 3.2% 1S $100000 1.1% BALANCE                   NA    NA        NA     NA    NA   
 8 3.2% AND 1.2%                                  NA    NA        NA     NA    NA   
 9 3.2% AND 1.2%                                  NA    NA        NA     NA    NA   
10 3.2% 1ST 1OOK AND 1.1% BALANCE                 NA    NA        NA     NA    NA   
11 3.2% AND 1.1%                                  NA    NA        NA     NA    NA   
12 3.0% 1ST $100000                               NA    NA        NA     NA    3000   
13 3.0% 1ST $100000                               NA    NA        NA     NA    3000   
14 3.2% 1ST $100000                               NA    NA        NA     NA    3200   
15 3.0% 1ST $100000                               NA    NA        NA     NA    3000   
16 3.0% 1ST $100000                               NA    NA        NA     NA    3000   
17 3.0% 1ST $100000                               NA    NA        NA     NA    3000   
18 3.0% 1ST $100000                               NA    NA        NA     NA    3000
19 3.2% 1ST $100000 AND $5000                     NA    NA        NA     NA    NA 

我该怎么做?

【问题讨论】:

    标签: r replace


    【解决方案1】:

    此代码仅关注“Commission20”条目中具有“percent% 1ST $dollars”格式的行:

    library(dplyr)
    library(stringr) # allows you to manipulate patterns from strings
    
    df %>% 
        mutate(
            percent = as.numeric(str_extract(Commission20, "^\\d\\.\\d")), 
            dollars = as.numeric(sub("T \\$", "", str_extract(Commission20, "T \\$\\d+$"))), 
            Fixed = percent/100 * dollars) %>% 
        select(-c(percent, dollars))
    

    此代码创建临时列来存储从“Commmission20”中提取的百分比和美元。然后它使用这些数字来计算“固定”值。

    stringr 包在这里可能是一个有用的工具(用于字符串操作)。如果您需要处理其他佣金条目,这个cheat sheet 可能会派上用场。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-06-03
      相关资源
      最近更新 更多