【问题标题】:Regular expression - How to include a single character also in this regexp?正则表达式 - 如何在此正则表达式中也包含单个字符?
【发布时间】:2019-11-27 14:07:31
【问题描述】:

这是我在这段文字中使用的正则表达式:

(?![!',:;?\-\d])(\w[A-Za-z']+)

正则表达式的风格是 ECMAScript (JavaScript)

示例文本:

This.Sentence.Has.Some.Funky.Stuff.U.S.S.R.Going.On.And.Contains.Some.   ABBREVIATIONS.Too.

This.Sentence.Has.Some.Funky.Stuff .U.S.S.R. Going.On.And.Contains.Some.   ABBREVIATIONS.Too.

A.S.A.P.?

Ctrl+Alt+Delete  

Mr.Smith bought google.com for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? A.d.a.m Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't. Mr. John Johnson Jr. was born in the U.S.A but earned his Ph.D. in Israel before joining Nike Inc. as an engineer! He also worked at craigslist.org as a b c d e F G H I J business analyst.

它正在做我想做的一切,但我也无法完成正则表达式以将单个字母与 a b c d e F G H I J 匹配,在正则表达式中它是 [a-zA-Z]

我不希望匹配 U.S.A 之类的文本,这就是我遇到问题的地方。

我在How to include character in regular expression 尝试了解决方案,但由于我的问题更复杂,我无法让它发挥作用。

我的任务是用任何东西包装匹配的物品。

这是相同正则表达式示例的链接: https://regex101.com/r/Qdq4AY/4

【问题讨论】:

  • 您可能会排除所有您不想匹配的内容并捕获您想要保留的内容\.?[a-zA-Z](?:\.[a-zA-Z])+\.?|\.[a-zA-Z]\.|(?!\d)(\w[A-Za-z']*)regex101.com/r/8O8GG6/1
  • 我想添加像aa b c d e F G H I J这样的单字母词。我不想删除U.S.A。但与他们不匹配。
  • 什么是正则表达式风格/工具/语言? regex101.com/r/lYdw5i/1
  • 我已经更新了 OP 以包含它。它是 ECMAScript (JavaScript)。
  • 目前您正在获得单独的匹配项,我认为您也可以使用捕获组版本ideone.com/8ZnCvz

标签: regex


【解决方案1】:

关于您尝试的模式的一些说明

  • 模式(?![!',:;?\-\d])(\w[A-Za-z']+) 将不匹配单个字符,因为这部分\w[A-Za-z']+ 由于+ 量词而匹配至少2 个字符
  • 否定的前瞻(?! 断言右边不是[!',:;?\-\d] 中的任何一个,然后匹配一个单词字符\w,但\w 也只匹配一个数字\d 而不是其余的。李>

一种选择是匹配您不想保留的内容以捕获您想要保留的内容:

\.?[a-zA-Z](?:\.[a-zA-Z])+\.?|\.[a-zA-Z]\.|(?!\d)(\w[A-Za-z']*)

部分

  • \.? 匹配一个可选的点
  • [a-zA-Z](?:\.[a-zA-Z])+\.? 匹配单个字符 a-zA-Z,然后重复 1+ 次点和单个字符和可选点
  • |或者
  • \.[a-zA-Z]\. 匹配两个点之间的字符 a-zA-Z
  • |
  • (?!\d)断言右边不是数字
  • (\w[A-Za-z']*) 在第 1 组中捕获匹配 1+ 个单词 char 并重复 0+ 次字符类中列出的任何字符

Regex demo

例如

const regex = /\.?[a-zA-Z](?:\.[a-zA-Z])+\.?|\.[a-zA-Z]\.|(?!\d)(\w[A-Za-z']*)/g;
const str = `This.Sentence.Has.Some.Funky.Stuff.U.S.S.R.Going.On.And.Contains.Some.   ABBREVIATIONS.Too.
 
This.Sentence.Has.Some.Funky.Stuff .U.S.S.R. Going.On.And.Contains.Some.   ABBREVIATIONS.Too.
 
A.S.A.P.?
 
Ctrl+Alt+Delete
 
Mr.Smith bought google.com for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? A.d.a.m Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't. Mr. John Johnson Jr. was born in the U.S.A but earned his Ph.D. in Israel before joining Nike Inc. as an engineer! He also worked at craigslist.org as a b c d e F G H I J business analyst.`;
let m;

while ((m = regex.exec(str)) !== null) {
  // This is necessary to avoid infinite loops with zero-width matches
  if (m.index === regex.lastIndex) {
    regex.lastIndex++;
  }
  if (undefined !== m[1]) {
    console.log(m[1]);
  }
}

【讨论】:

  • 完美,非常感谢。多年来我一直在使用正则表达式,但这个让我很难过。我刚刚添加了数字,所以它也像(\d?\.?\d+|\.?[a-zA-Z](?:\.[a-zA-Z])+\.?|\.[a-zA-Z]\.|(?!\d)(\w[A-Za-z']*)) 一样包装它们,但你已经回答了我原来的问题。再次感谢。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2011-06-27
  • 2018-06-21
  • 1970-01-01
  • 2020-07-27
相关资源
最近更新 更多