【问题标题】:Get the Matched String?获取匹配的字符串?
【发布时间】:2017-01-18 01:34:07
【问题描述】:

我有两个数据框,一个有产品名称,另一个有类别。现在我需要将类别与产品名称匹配,如果字符串匹配,则将相应的类别分配给名称。

所以包含产品名称 (Product_Name.csv) 的第一个数据框是:

           **Product.Name**
       Black Printed Blouse
Silver Embellished Crop Top
   Maroon Solid Strappy Top

包含类别的另一个数据框(Category.csv)是:

**Category**
     Strappy
      Blouse
        Crop 

最终输出应该是:

       Black Printed Blouse       Blouse
Silver Embellished Crop Top         Crop
   Maroon Solid Strappy Top      Strappy

现在,我正在使用 grepl,它给出真假

product <- read.csv("Product_Name.csv", header = T, sep = ",")
category <- read.csv("Category.csv", header = T, sep = ",")


for (i in 1:nrow(product)){

product[i, 2] <- grepl(Category$Category[1], product$Product.Name[i], ignore.case = TRUE)
product[i, 3] <- grepl(Category$Category[2], product$Product.Name[i], ignore.case = TRUE)
product[i, 4] <- grepl(Category$Category[3], product$Product.Name[i], ignore.case = TRUE)


}

【问题讨论】:

标签: r


【解决方案1】:

我们可以使用str_extract

library(stringr)
product$Category <- str_extract(product$Product.Name, paste(category$Category, collapse="|"))
product
#                 Product.Name Category
#1        Black Printed Blouse   Blouse
#2 Silver Embellished Crop Top     Crop
#3    Maroon Solid Strappy Top  Strappy

【讨论】:

    【解决方案2】:

    使用基础 - R

    indices = sapply(category$Category, function(x) which(grepl(x, product$Product.Name)))
    
    product$new_col = 1:nrow(product)
    product$new_col[indices] = names(indices)
    #> df
    #            X..Product.Name.. new_col
    #1        Black Printed Blouse  Blouse
    #2 Silver Embellished Crop Top    Crop
    #3    Maroon Solid Strappy Top Strappy
    

    # incase of any no-match cases(which we need to handle well)
    # below code manages both well (a generalised version)
    
    category$Category[2] = "Bloiuse"
    
    indices = sapply(category$Category, function(x) which(grepl(x, product$Product.Name)))
    indices.loc <- as.numeric(indices)
    indices.name <- names(indices)
    
    product$new_col[indices.loc[!is.na(indices.loc)]] = indices.name[!is.na(indices.loc)]
    
    #> product
    #                 Product.Name new_col
    #1        Black Printed Blouse    <NA>
    #2 Silver Embellished Crop Top    Crop
    #3    Maroon Solid Strappy Top Strappy
    

    【讨论】:

    • 您能否分享您对此答案的反馈。请感谢您为撰写答案所做的努力,不要忽视它。如果它没有很好地回答你的问题,请帮助我改进这个答案。也请访问stackoverflow.com/help/someone-answers 并以积极的方式接受它。谢谢! :)
    猜你喜欢
    • 1970-01-01
    • 2019-04-08
    • 2011-08-17
    • 2019-07-07
    • 1970-01-01
    • 2018-04-06
    • 1970-01-01
    • 1970-01-01
    • 2014-11-18
    相关资源
    最近更新 更多