【问题标题】:Find all variants of word in R在 R 中查找单词的所有变体
【发布时间】:2016-04-22 01:15:59
【问题描述】:

我有以下的话。

words <- c("hail(0.75)", "hail0.75", "hail0.88", "hail075", "hail1.00", "hail1.75", "hail100", "hail125", "hail1.75)", "hail150", "hail175", "hail200", "hail225", "hail275", "hail450", "hail088", "hail75", "hail80", "hail88")

     [1] "hail(0.75)" "hail0.75"   "hail0.88"   "hail075"    "hail1.00"   "hail1.75"  
     [7] "hail100"    "hail125"    "hail1.75)"  "hail150"    "hail175"    "hail200"   
    [13] "hail225"    "hail275"    "hail450"    "hail088"    "hail75"     "hail80"    
    [19] "hail88" 

如您所见,hail(0.75) 以各种拼写错误/格式重复(即hail075hail0.75

如何找到所有出现的hail(0.75),包括上面描述的变体?

我试过了

grep("hail[0,7,5]"), words, value = T) [1] "hail0.75" "hail0.88" "hail075" "hail088" "hail75"

查找包含数字 075 的冰雹实例。
但是,它包括不想要的hail088,不包括想要的hail(0.75)

【问题讨论】:

  • hail75hail0.75 怎么一样?
  • 以上数值是指冰雹的大小。我知道这是一个错字,因为冰雹的值在 0.25 英寸到 5.00 英寸之间。但是,我将排除hail75,因为它可能是hail(1.75)hail(0.75) 的拼写错误。感谢您指出。

标签: regex r character


【解决方案1】:

另一种选择是删除所有非数字数字并将其用作索引:

idx <- gsub("[^[:digit:]]","",words)
words[idx=="075"]
[1] "hail(0.75)" "hail0.75"   "hail075"

【讨论】:

    【解决方案2】:

    这是你要找的吗?

    > x <- c("hail(0.75)", "hail0.75", "hail0.88", "hail075", "hail1.00", "hail1.75", "hail100", "hail125", "hail1.75)", "hail150", "hail175", "hail200", "hail225", "hail275", "hail450", "hail088", "hail75", "hail80", "hail88")
    > x
     [1] "hail(0.75)" "hail0.75"   "hail0.88"   "hail075"    "hail1.00"
     [6] "hail1.75"   "hail100"    "hail125"    "hail1.75)"  "hail150"
    [11] "hail175"    "hail200"    "hail225"    "hail275"    "hail450"
    [16] "hail088"    "hail75"     "hail80"     "hail88"
    

    而你 grep:

    > x[grep("^hail[[:punct:]]*0[[:punct:]]*75.*", x)]
    [1] "hail(0.75)" "hail0.75"   "hail075"
    

    这是假设 7 和 5 总是彼此相邻的情况下工作的。 快速解释:^ 表示字符串的开头,[[:punct:]] 是任何标点符号,* 是前一个字符(在本例中为 [[:punct:]])重复 0 次或多次。

    【讨论】:

    • 这是不正确的。例如,它将选择hail0..775。即:grep("^hail.*0.*75.*", "hail0..775")
    • 你说得对,中间可能有另一个数字,所以我调整了 grep 以包含 [[:punct:]]。不过我更喜欢单线。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2017-09-10
    • 1970-01-01
    • 1970-01-01
    • 2019-08-20
    相关资源
    最近更新 更多