用于匹配具有 4-6 位 ID 的 URL 的正则表达式答案

【问题标题】：RegEx for matching URLs with 4-6 digits ID用于匹配具有 4-6 位 ID 的 URL 的正则表达式
【发布时间】：2019-05-18 00:06:07
【问题描述】：

我正在尝试匹配以 "example.com/" 开头并后跟 4-6 位数字且下一个字符不是数字（如果有下一个字符）的 URL。

例如，"example.com/12345" 应该匹配。

"example.com/1234567" 应该不匹配。

"example.com/123456g7" 应该匹配。

我已经尝试过"example.com/(\d{4,6}).*"，但是当我给它"example.com/1234567" 时它匹配，这是不正确的。

我该如何解决这个问题？

【问题讨论】：

标签： regex regex-negation regex-lookarounds regex-group regex-greedy

【解决方案1】：

这个表达式添加了额外的边界，只是为了安全地传递你想要的 URL：

^(https?:\/\/(www.)?)(example\.com)\/(?:[0-9]{4,6})?([a-z].*)?$

如果你愿意，你可以减少界限。在这里，我们可以添加几个捕获组以方便调用。

$ 是使您不希望的 URL 输入失败的键。

正则表达式

如果这不是您想要的表达方式，您可以在regex101.com 中修改/更改您的表达方式。

正则表达式电路

您还可以在jex.im 中可视化您的表达式：

JavaScript 演示

const regex = /^(https?:\/\/(www.)?)(example\.com)\/(?:[0-9]{4,6})?([a-z].*)?$/gm;
const str = `http://example.com/12345
https://example.com/123456g7
http://www.example.com/12345
https://www.example.com/123456g7
http://www.example.com/12345
https://www.example.com/123456g7
http://www.example.com/123456adfasdfasdf98989898
https://www.example.com/123456g7adfadfa0909009
http://example.com/1234567
https://example.com/1234567`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

Python 测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"^(https?:\/\/(www.)?)(example\.com)\/(?:[0-9]{4,6})?([a-z].*)?$"

test_str = ("http://example.com/12345\n"
    "https://example.com/123456g7\n"
    "http://www.example.com/12345\n"
    "https://www.example.com/123456g7\n"
    "http://www.example.com/12345\n"
    "https://www.example.com/123456g7\n"
    "http://www.example.com/123456adfasdfasdf98989898\n"
    "https://www.example.com/123456g7adfadfa0909009\n"
    "http://example.com/1234567\n"
    "https://example.com/1234567")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        
        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

【讨论】：

【解决方案2】：

在匹配 4 到 6 位数字后对一个数字进行负前瞻：

example.com\/\d{4,6}(?!\d).*

https://regex101.com/r/YWmhgY/1/

【讨论】：

【解决方案3】：

另一种方法。

^example\.com/(\d{4,6})(?:\D.*)?$

【讨论】：

(?:\D.*) 中的冒号是什么意思？
@user101 - 它是 3 个字符 (?: 非捕获组构造开头的一部分。在这种情况下，它用于告诉引擎它的内容\D.*，作为一个组是可选的。