【问题标题】:how to use boundary with str_detect (tidyr package)如何使用 str_detect 边界(tidyr 包)
【发布时间】:2020-02-11 16:17:41
【问题描述】:

这是一些数据。

library(stringr)
library(dplyr)

df <- tibble(sentences)

我想识别所有带有“她”这个词的句子。但这当然也会返回带有“那里”和“这里”之类的词的句子。

df %>% filter(str_detect(sentences, "her"))
# A tibble: 43 x 1
   sentences                                    
   <chr>                                        
 1 The boy was there when the sun rose.         
 2 Help the woman get back to her feet.         
 3 What joy there is in living.                 
 4 There are more than two factors here.        
 5 Cats and dogs each hate the other.           
 6 The wharf could be seen at the farther shore.
 7 The tiny girl took off her hat.              
 8 Write a fond note to the friend you cherish. 
 9 There was a sound of dry leaves outside.     
10 Add the column and put the sum here. 

stringr::str_detect 的文档说:“将字符、单词、行和句子边界与 boundary() 匹配。”我无法弄清楚如何做到这一点,也无法在任何地方找到示例。所有文档示例都涉及str_splitstr_count 函数。

我的问题与this question有关,但我特别想了解如何使用stringr::boundary函数。

【问题讨论】:

    标签: r stringr


    【解决方案1】:

    我们可以在开头和结尾指定单词边界(\\b)以避免任何部分匹配

    library(stringr)
    library(dplyr)
    df %>% 
        filter(str_detect(sentences, "\\bher\\b"))
    #                             sentences
    #1 Help the woman get back to her feet.
    #2      The tiny girl took off her hat.
    

    或者使用boundary来换行

    df %>%
          filter(str_detect(sentences, boundary("her")))
    

    【讨论】:

    • 谢谢!这是一个非常好的解决方案。我在想stringr::boundary 提供了一种解决这个问题的方法时是否完全偏离了基础?
    • @JohnJ。在文档中,它说。用 boundary words &lt;- c("These are some words."); str_count(words, boundary("word"))str_detect(words, boundary("word"))# [1] TRUE 包裹
    猜你喜欢
    • 1970-01-01
    • 2012-02-22
    • 1970-01-01
    • 1970-01-01
    • 2016-06-10
    • 2017-03-25
    • 1970-01-01
    • 1970-01-01
    • 2016-01-11
    相关资源
    最近更新 更多