【问题标题】:How to remove a pattern of text ending with a colon in R?如何删除R中以冒号结尾的文本模式?
【发布时间】:2019-07-12 13:34:44
【问题描述】:

我有下面这句话

review <- C("1a. How long did it take for you to receive a personalized response to an internet or email inquiry made to THIS dealership?: Approx. It was very prompt however. 2f. Consideration of your time and responsiveness to your requests.: Were a little bit pushy but excellent otherwise 2g. Your satisfaction with the process of coming to an agreement on pricing.: Were willing to try to bring the price to a level that was acceptable to me. Please provide any additional comments regarding your recent sales experience.: Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)! ")

我想删除之前的所有内容:

我试过下面的代码,

gsub("^[^:]+:","",review)

但是,它只删除了以冒号结尾的第一句

预期结果:

Approx. It was very prompt however. Were a little bit pushy but excellent otherwise Were willing to try to bring the price to a level that was acceptable to me. Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)!

任何帮助或建议将不胜感激。谢谢。

【问题讨论】:

  • 你的问题不清楚。之前的一切都可以包括所有的字符。是一个句子吗?
  • 所以你只想删除1a.2f.2g.:?每行的这些字符是否相同?
  • 对不起,我的意思是我想摆脱句子中的所有问题,只保留回复。就我而言,问题以冒号结尾,这就是为什么我在冒号之前提到了所有内容
  • 试试gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)
  • 如果您能解释一下正则表达式,那就太好了。

标签: r regex gsub


【解决方案1】:

如果句子不复杂且没有缩写,您可以使用

gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)

请参阅regex demo

请注意,您可以通过将 \\d+[a-zA-Z] 更改为 [0-9a-zA-Z]+ / [[:alnum:]]+ 以匹配 1+ 位数字或字母来进一步概括它。

详情

  • (?:\d+[a-zA-Z]\.)? - 可选序列
    • \d+ - 1 位以上
    • [a-zA-Z] - 一个 ASCII 字母
    • \. - 一个点
  • [^.?!:]* - 除.?!: 之外的 0 个或多个字符
  • [?!.] - ?!.
  • : - 冒号
  • \s* - 0+ 个空格

R 测试:

> gsub("(?:\\d+[a-zA-Z]\\.)?[^.?!:]*[?!.]:\\s*", "", review)
[1] "Approx. It was very prompt however. Were a little bit pushy but excellent otherwise Were willing to try to bring the price to a level that was acceptable to me.Abel is awesome! Took care of everything from welcoming me into the dealership to making sure I got the car I wanted (even the color)! "

扩展以处理缩写

如果您添加交替,您可以枚举异常:

gsub("(?:\\d+[a-zA-Z]\\.)?(?:i\\.?e\\.|[^.?!:])*[?!.]:\\s*", "", review)     
                          ^^^^^^^^^^^^^^^^^^^^^^ 

在这里,(?:i\.?e\.|[^.?!:])* 匹配 0 个或多个 ie.i.e. 子字符串或除 .?!: 之外的任何字符。

this demo

【讨论】:

  • 对于诸如“4c。请在还给您时评价您的车辆状况(即清洁度、未损坏)。:非常感谢您的清洗!”这样的句子,正则表达式不会返回预期结果。我该怎么办?
  • @gamyanaidu 我在一开始就添加了:如果没有缩写。如果有,你可以手动添加,比如(?:\d+[a-zA-Z]\.)?(?:i\.?e\.|[^.?!:])*[?!.]:\s*,见this demo
  • 完美答案。非常感谢。
猜你喜欢
  • 2021-12-22
  • 1970-01-01
  • 1970-01-01
  • 2014-05-12
  • 2012-08-31
  • 1970-01-01
  • 2019-11-26
  • 2011-05-20
  • 1970-01-01
相关资源
最近更新 更多