【问题标题】:search for word/phrase from column in R从 R 中的列中搜索单词/短语
【发布时间】:2021-09-13 18:42:57
【问题描述】:

我的数据如下所示:

> head(df)
  ID                                                 Comment
1  1                                            I ate dinner.
2  2                              We had a three-course meal.
3  3                             Brad came to dinner with us.
4  4                                     He loves fish tacos.
5  5  In the end, we all felt like we ate too much. Code 5.16
6  6   We all agreed; it was a magnificent evening.72 points.

我想创建两个新列,一个名为A,一个名为B。 如果出现以下任何单词/短语,我希望 A 列等于 1:dinner,evening,we ate 如果出现以下任何单词/短语,我希望 B 列等于 1:in the end,all,Brad,5.16

我该怎么做呢?请注意,我需要完全匹配。

【问题讨论】:

    标签: r nlp tm


    【解决方案1】:

    我们可以在base R中使用grepl

    df$A <- +(grepl("\\b(dinner|evening|we|ate)\\b", df$Comment))
    df$B <- +(grepl("\\b(in the end|all|Brad|5\\.16)\\b", df$Comment))
    

    -输出

    df
      ID                                                 Comment A B
    1  1                                           I ate dinner. 1 0
    2  2                             We had a three-course meal. 0 0
    3  3                            Brad came to dinner with us. 1 1
    4  4                                    He loves fish tacos. 0 0
    5  5 In the end, we all felt like we ate too much. Code 5.16 1 1
    6  6  We all agreed; it was a magnificent evening.72 points. 1 1
    

    注意:也可以使用paste 创建模式

    v1 <- c("dinner", "evening", "we", "ate")
    v2 <- c("in the end", "all", "Brad", "5.16")
    pat1 <- paste0("\\b(", paste(v1, collapse = "|"), ")\\b")
    pat2 <- paste0("\\b(", paste(v2, collapse = "|"), ")\\b")
    df$A <- +(grepl(pat1, df$Comment))
    df$B <- +(grepl(pat2, df$Comment))
    

    数据

    df <- structure(list(ID = 1:6, Comment = c("I ate dinner.", "We had a three-course meal.", 
    "Brad came to dinner with us.", "He loves fish tacos.", "In the end, we all felt like we ate too much. Code 5.16", 
    "We all agreed; it was a magnificent evening.72 points.")),
     class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5", "6"))
    

    【讨论】:

    • 这个解决方案不处理'in the end'
    • @user11015000 只需将end 更改为in the end
    【解决方案2】:

    这行得通吗:

    library(dplyr)
    library(stringr)
    
    df %>% mutate(A = +str_detect(Comment,str_c(c('dinner','evening','we ate'), collapse = '|')),
                  B = +str_detect(Comment,str_c(c('in the end','all','Brad','5.16'), collapse = '|')))
    # A tibble: 6 x 4
         ID Comment                                                     A     B
      <dbl> <chr>                                                   <int> <int>
    1     1 I ate dinner.                                               1     0
    2     2 We had a three-course meal.                                 0     0
    3     3 Brad came to dinner with us.                                1     1
    4     4 He loves fish tacos.                                        0     0
    5     5 In the end, we all felt like we ate too much. Code 5.16     1     1
    6     6 We all agreed; it was a magnificent evening.72 points       1     1
    

    【讨论】:

      猜你喜欢
      • 2015-08-28
      • 1970-01-01
      • 2020-01-16
      • 1970-01-01
      • 2020-03-22
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多