【发布时间】:2016-01-27 03:53:49
【问题描述】:
我必须匹配一个关键字,前提是它不是在复合 URL 或某些单词的句子中。比如关键字.NET,字符串不能有http://,.NET后面的字符不能是work 或 flix,但可以是 framework 或任何其他词,甚至什么都不是。正则表达式必须不区分大小写。
我有这些例子要匹配:
- 框架 .NET
- 那是.NET框架
- 微软.NET
- .NET框架(更新)
- .net框架(更新)
- .net(更新)
这些例子不匹配:
- 这是一个 URL http://www.my.net/ 不匹配
- 网络不匹配,因为缺少点
- .NET工作已完成
- Microsoft.NetworkAndSharingCenter
- 4df9e0f8.netflix_mcm4njqhnhss8
- .net工作(更新)
- .Net工作(更新)
我写了这个模式:
(?i)(.*)(?!.*http\:\/\/.*)(\.net)(?!.*work)(?!.*flix)(.*)
我在下面编写了这些测试用例,但 testMatch_02() 和 testNotMatch_01() 都失败了,我无法弄清楚原因。
更新 1
我又添加了三个测试用例:testNotMatch_03()、testNotMatch_04() 和 testNotMatch_05()。他们对给定的正则表达式运行良好。但是testMatch_02() 和testNotMatch_01() 仍然如前所述失败。我决定添加这些新的测试用例,以澄清在 .NET 之前并不总是会有空格。
更新 2
我已将一些模式从(?i)(.*)(?!.*http\:\/\/.*)(\.net)(?!.*work)(?!.*flix)(.*) 更改为(?i)(.*)(?!http\\:\\/\\/)(.*)(\\.net)(?!work|flix)(.*)。因此,除了testNotMatch_01() 之外,所有测试用例都运行良好。我已经更新了测试的代码,以防有人想用这个新模式运行它。
更新 3
拜托,如果有人之前运行过测试用例并根据它做出假设,我将非常感激。我们可以避免在聊天对话中提出这个问题。
更新 4
重要的是,不仅列出的示例必须通过,而且正则表达式必须根据问题原始措辞中的描述进行验证。在与@Thomas 交谈后,我在下面的代码中包含了三个新的匹配示例和两个新的不匹配示例,以及@Thomas 提供的正则表达式。另外,我把代码改成了@Thomas 提供的代码,更简单更短,就像他的正则表达式一样。
package com.regex;
public class TestRegex
{
//private static final String regex = "(?i)(.*)(?!.*http\\:\\/\\/.*)(\\.net)(?!.*work)(?!.*flix)(.*)";
//private static final String regex = "(?i)(.*)(?!http\\:\\/\\/)(.*)(\\.net)(?!work|flix)(.*)";
private static final String regex = "(?i).*( |microsoft).net($|Framework)"; //@Thomas
public static void main(String[] args)
{
String str = "The framework .NET";
System.out.println("testMatch_01() must match: [" + str + "] => " + str.matches(regex));
str = "That is .NETFramework";
System.out.println("testMatch_02() must match: [" + str + "] => " + str.matches(regex));
str = "Microsoft.NET";
System.out.println("testMatch_03() must match: [" + str + "] => " + str.matches(regex));
str = "That is .netframework";
System.out.println("testMatch_04() must match: [" + str + "] => " + str.matches(regex));
str = ".netframework";
System.out.println("testMatch_05() must match: [" + str + "] => " + str.matches(regex));
str = ".NETFramework";
System.out.println("testMatch_06() must match: [" + str + "] => " + str.matches(regex));
str = "This is a URL http://www.my.net";
System.out.println("testNotMatch_01() must not match: [" + str + "] => " + str.matches(regex));
str = "The Network isn't matching because the missing point";
System.out.println("testNotMatch_02() must not match: [" + str + "] => " + str.matches(regex));
str = "The .NETwork is up";
System.out.println("testNotMatch_03() must not match: [" + str + "] => " + str.matches(regex));
str = "Microsoft.NetworkAndSharingCenter";
System.out.println("testNotMatch_04() must not match: [" + str + "] => " + str.matches(regex));
str = "4df9e0f8.netflix_mcm4njqhnhss8";
System.out.println("testNotMatch_05() must not match: [" + str + "] => " + str.matches(regex));
}
}
以上代码的输出为:
使用正则表达式 (?i)(.*)(?!http\\:\\/\\/)(.*)(\\.net)(?!work|flix)(.*)testNotMatch_01() 失败
testMatch_01() must match: [The framework .NET] => true
testMatch_02() must match: [That is .NETFramework] => true
testMatch_03() must match: [Microsoft.NET] => true
testMatch_04() must match: [That is .netframework] => true
testMatch_05() must match: [.netframework] => true
testMatch_06() must match: [.NETFramework] => true
testNotMatch_01() must not match: [This is a URL http://www.my.net] => true
testNotMatch_02() must not match: [The Network isn't matching because the missing point] => false
testNotMatch_03() must not match: [The .NETwork is up] => false
testNotMatch_04() must not match: [Microsoft.NetworkAndSharingCenter] => false
testNotMatch_05() must not match: [4df9e0f8.netflix_mcm4njqhnhss8] => false
使用正则表达式 (?i).*( |microsoft).net($|Framework)testMatch_05() 和 testMatch_06() 失败
testMatch_01() must match: [The framework .NET] => true
testMatch_02() must match: [That is .NETFramework] => true
testMatch_03() must match: [Microsoft.NET] => true
testMatch_04() must match: [That is .netframework] => true
testMatch_05() must match: [.netframework] => false
testMatch_06() must match: [.NETFramework] => false
testNotMatch_01() must not match: [This is a URL http://www.my.net] => false
testNotMatch_02() must not match: [The Network isn't matching because the missing point] => false
testNotMatch_03() must not match: [The .NETwork is up] => false
testNotMatch_04() must not match: [Microsoft.NetworkAndSharingCenter] => false
testNotMatch_05() must not match: [4df9e0f8.netflix_mcm4njqhnhss8] => false
【问题讨论】:
-
(?i)\s\.net(?!work)怎么样? -> regex101.com/r/kI9rA9/1 -
@JoshCroizer 使用此模式,所有“不匹配”测试用例都可以,但
testMatch_01()和testMatch_02()已失败。为了清楚起见,我刚刚编辑了这个问题。 -
@JoshCrozier 我看到您使用了 PHP 风格的正则表达式。在 Java 中,这似乎有点不同,因为上面提到的测试用例失败了。
-
(?|(?|.*|Microsoft)\.net(?|Framework|$))怎么样 -
@Thomas 这会导致异常:
java.util.regex.PatternSyntaxException: Unknown inline modifier near index 2 (?|(?|.*|Microsoft)\.net(?|Framework|$))
标签: java regex regex-negation case-insensitive