正则表达式查找没有特定单词的 mp3 URL答案

【问题标题】：Regular Expression to find mp3 URLs without a specific word正则表达式查找没有特定单词的 mp3 URL
【发布时间】：2019-06-19 21:20:23
【问题描述】：

我想从其中没有特定单词的页面源中提取 mp3 url。

这是我用来搜索 mp3 url 的正则表达式：

https?:\/\/.+\.mp3

它工作正常。现在我想排除那些包含特定单词的网址。所以，我需要其中没有特定单词的网址。

如何排除http 和.mp3 之间的单词？

我将在带有 C++ 的 Qt 中使用它，但只要它适用于 https://regex101.com/ 就可以了。

【问题讨论】：

Regular expressions: Ensuring b doesn't come between a and c的可能重复
@CertainPerformance - 不，那是不同的。如果您阅读说明，它会显示contains 123 somewhere in the middle。但是，我希望表达式不包含单词。
完全一样——看问题的最后一部分，and there are no other instances of abc or xyz in the substring besides the start and the end.——就像上面的答案防止abc出现在比赛中间一样，你只需要应用相同的符合您的模式的逻辑。

标签： regex url

【解决方案1】：

我希望这是一个有用的答案。

这是一个在 python3 上带有用例的正则表达式。因此，如果您想排除 http 和 .mp3 之间的“单词”，您可以这样做。

import re

ref = "http://www.some_undesired_text_018/m102/1-225x338.mp3"

_del = re.findall(r'https?(.+)\.mp3', ref)[0]

out = ref.replace(_del, "")

#_del will contain the undesired word

【讨论】：

我没有使用 python。

【解决方案2】：

如果你想“排除那些没有特定单词的 url”，你可以对单词使用积极的前瞻（前面有一些字符），例如

(?=.*Sing)

在 Javascript 中：

const word = 'Sing';
const urls = ['http://I_like_to_sing.mp3', 'http://Another_song.mp3'];
let regex = new RegExp('https?:\/\/(?=.*' + word + ').+\.mp3', 'i');
console.log(urls.filter(v => v.match(regex)));

在 PHP 中

$word = 'Sing';
$urls = ['http://I_like_to_sing.mp3', 'http://Another_song.mp3'];
$regex = "/https?:\/\/(?=.*$word).+\.mp3/i";
print_r(array_filter($urls, function ($v) use ($regex) { return preg_match($regex, $v); }));

输出：

Array ( 
    [0] => http://I_like_to_sing.mp3 
)

Demo on 3v4l.org

更新

要排除那些包含特定单词的 URL，您可以使用否定前瞻来代替，例如

(?![^.]*Sing)

我们使用[^.] 来确保单词出现在.mp3 部分之前。这是一个 PHP 演示：

$word = 'Song';
$string = "some words http://I_like_to_sing.mp3 and then some other words http://Another_song.mp3 and some words at the end...";
$regex = "/(https?:\/\/(?![^.]*$word).+?\.mp3)/i";
preg_match_all($regex, $string, $matches);
print_r($matches[1]);

输出：

Array ( 
    [0] => http://I_like_to_sing.mp3
)

Demo on 3v4l.org

【讨论】：

抱歉，我的问题有误，我已修正。
@NESHOM 您不应该将此标记为已接受，它不会回答您的实际问题。我一直想重新审视这个问题，我做了一个编辑，我认为这会解决你的问题。
你吃对了。它确实有点帮助，但没有直接回答。因此，请发布您的更新答案。谢谢。

【解决方案3】：

对尼克的回答稍作修改。您可以通过在过滤器函数中否定从匹配返回的值来排除单词，如下所示：

urls.filter(v => !v.match(regex));

这比另一种解决方案更有效，而且更容易，这会产生意想不到的结果。

const word = 'Sing';
const urls = ['http://I_like_to_sing.mp3', 'http://Another_song.mp3'];
let regex = new RegExp('https?:\/\/(?=.*' + word + ').+\.mp3', 'i');
console.log(urls.filter(v => !v.match(regex)));

【讨论】：