【问题标题】:Filter strings that only contains some letters in R过滤仅包含 R 中某些字母的字符串
【发布时间】:2021-12-10 20:37:09
【问题描述】:

我想过滤数据框(包含单词)的行以仅保留由某些字母组成的单词。例如,假设我有一个数据框,例如:

library(tidyverse)

df <- data.frame(words = c("acerbe", "malus", "as", "clade", "after", "sel", "moineau") )

   words
1 acerbe
2  malus
3     as
4  clade
5  after
6    sel
7 moineau

我只想保留由以下字母组成的行(单词)(并且只保留它们):

letters <-  c("a", "z", "e", "r", "q", "s", "d", "f", "w", "x", "c")

换句话说,我想排除包含除上面列出的字母之外的其他字母的单词。

我尝试过使用 string::str_detect(),但到目前为止没有成功...

letters <- "a|z|e|r|q|s|d|f|w|x|c"

df <- data.frame(words = c("acerbe", "malus", "as", "clade", "after", "sel", "moineau") )
df %>% filter(str_detect(string = words, pattern = letters, negate = FALSE) )

    words
1  acerbe
2   malus
3      as
4   clade
5   after
6     sel
7 moineau

【问题讨论】:

    标签: r string dplyr tidyverse


    【解决方案1】:

    dplyr 方法:

    df %>% 
    rowwise() %>% 
    filter(sum(str_count(words, letters))==nchar(words)) 
    
    # A tibble: 1 x 1
    # Rowwise: 
      words
      <chr>
    1 as
    

    【讨论】:

      【解决方案2】:

      我会在这里使用grepl 方法:

      letters <-  c("a", "z", "e", "r", "q", "s", "d", "f", "w", "x", "c")
      regex <- paste0("^[", paste(letters, collapse=""), "]+$")
      df$words[grepl(regex, df$words)]
      
      [1] "as"
      

      请注意,此处与grepl 一起使用的正则表达式模式是:

      ^[azerqsdfwxc]+$
      

      在您的输入数据框中,唯一只包含这些字母的单词恰好是as

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2017-02-23
        • 1970-01-01
        • 2011-03-26
        • 1970-01-01
        • 2014-05-07
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多