【问题标题】:How do I add a column to one dataframe based on a partial string match in another dataframe? [duplicate]如何根据另一个数据框中的部分字符串匹配向一个数据框添加一列? [复制]
【发布时间】:2019-05-07 22:07:29
【问题描述】:

我正在尝试使用一个数据集来清理另一个数据集。

我有一个名为MiscodedVisits的(人为错误)课程名称错误编码的数据框

# A tibble: 3 x 3
  EMAIL      SemesterYear Course
  <chr>      <chr>        <chr> 
1 aap@fn.edu S16          CHM212
2 aar@fn.edu S14          PHY000
3 abc@fn.edu F17          PHY000

我当然有一个名为Rosters 的数据框。

# A tibble: 5 x 3
  EMAIL      SemesterYear Course
  <chr>      <chr>        <chr> 
1 aap@fn.edu S17          CHM212
2 aap@fn.edu S16          CHM112
3 aar@fn.edu S14          PHY222
4 abc@fn.edu F17          AST300
5 abc@fn.edu F17          MAT255

我想在RostersEMAILSemesterYear)中查找错误编码的Course,以便根据代表课程的Course 字符串的部分匹配添加CorrectedCourse(CHM 、PHY等)

我想要的结果是 MiscodedVisits 看起来像:

# A tibble: 3 x 4
  EMAIL      SemesterYear Course CorrectedCourse
  <chr>      <chr>        <chr>  <chr>          
1 aap@fn.edu S16          CHM212 CHM112         
2 aar@fn.edu S14          PHY000 PHY222         
3 abc@fn.edu F17          PHY000 NA 

我试过了: A. 根据Rosters$Course 的字符串匹配,在MiscodedVisits 中改变一个新列CorrectedCoursemutate(CorrectedCourse = DemoPerf$Course [match(EMAIL, DemoPerf$EMAIL) & match(SemesterYear, DemoPerf$SemesterYear)] ) 语法失败 Error in match(EMAIL, DemoPerf$EMAIL) : object 'EMAIL' not found

B. fuzzy_inner_join (MiscodedVisits, Rosters, by= c(Course = "S\\d{2}"), match_fun = str_detect) 错误:Error: Columncolmust be a 1d atomic vector or a list

C. regex_inner_join (MiscodedVisits, Rosters, by= c(Course = "S\\d{2}")) 错误:Error: Columncolmust be a 1d atomic vector or a list

【问题讨论】:

    标签: r string dplyr


    【解决方案1】:

    您可以使用 dplyrstringr 来做到这一点

    library(stringr)
    library(dplyr)
    
    MiscodedVisits %>% mutate(code = str_extract(Course, "[A-Z]*")) %>%
      left_join(Rosters %>% mutate(code = str_extract(Course, "[A-Z]*")), 
                by = c("EMAIL", "SemesterYear", "code")) %>% select(-code)
    
    #       EMAIL SemesterYear Course.x Course.y
    #1 aap@fn.edu          S16   CHM212   CHM112
    #2 aar@fn.edu          S14   PHY000   PHY222
    #3 abc@fn.edu          F17   PHY000     <NA>
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-12-26
      • 2020-11-25
      • 1970-01-01
      • 2018-12-29
      • 1970-01-01
      • 2016-08-30
      • 2021-07-12
      • 2021-12-16
      相关资源
      最近更新 更多