基于正则表达式模式在 Python 中拆分字符串答案

【问题标题】：Splitting a string in Python based on a regex pattern基于正则表达式模式在 Python 中拆分字符串
【发布时间】：2018-11-11 17:41:34
【问题描述】：

我有一个包含 url 的 bytes 对象：

> body.decode("utf-8") 
> 'https://www.wired.com/story/car-news-roundup-tesla-model-3-sales/\r\n\r\nhttps://cleantechnica.com/2018/11/11/can-you-still-get-the-7500-tax-credit-on-a-tesla-model-3-maybe-its-complicated/\r\n'

我需要将它拆分成一个列表，每个 url 作为一个单独的元素：

import re
pattern = '^(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$'

urls = re.compile(pattern).split(body.decode("utf-8"))

我得到的是一个包含所有 url 的元素的列表：

['https://www.wired.com/story/car-news-roundup-tesla-model-3-sales/\r\n\r\nhttps://cleantechnica.com/2018/11/11/can-you-still-get-the-7500-tax-credit-on-a-tesla-model-3-maybe-its-complicated/\r\n']

如何将每个 url 拆分为单独的元素？

【问题讨论】：

为什么不用\s+分割呢？这应该会给你所需的结果。
@PushpeshKumarRajwanshi 你能举个例子吗？
这可能是因为你的模式不匹配任何东西，所以它没有分割任何东西。
你最好使用 findall() 之类的东西，使用你修改过的模式 (?m)^(?:https?:\/\/(?:www\.)?)?[a-z0-9]+(?:[\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(?::[0-9]{1,5})?(?:\/.*)?

标签： regex string list python-3.6

【解决方案1】：

尝试用\s+拆分它

试试这个示例 python 代码，

import re
s = 'https://www.wired.com/story/car-news-roundup-tesla-model-3-sales/\r\n\r\nhttps://cleantechnica.com/2018/11/11/can-you-still-get-the-7500-tax-credit-on-a-tesla-model-3-maybe-its-complicated/\r\n'
urls = re.compile('\s+').split(s)
print(urls)

这个输出，

['https://www.wired.com/story/car-news-roundup-tesla-model-3-sales/', 'https://cleantechnica.com/2018/11/11/can-you-still-get-the-7500-tax-credit-on-a-tesla-model-3-maybe-its-complicated/', '']

这个结果看起来不错吗？或者我们可以按照您的意愿进行开发。

如果您不想在结果列表中出现空字符串 ('')（因为最后是 \r\n），您可以使用 find all 来查找字符串中的所有 URL。以下是相同的示例 python 代码，

import re
s = 'https://www.wired.com/story/car-news-roundup-tesla-model-3-sales/\r\n\r\nhttps://cleantechnica.com/2018/11/11/can-you-still-get-the-7500-tax-credit-on-a-tesla-model-3-maybe-its-complicated/\r\n'
urls = re.findall('http.*?(?=\s+)', s)
print(urls)

这给出了以下输出，

['https://www.wired.com/story/car-news-roundup-tesla-model-3-sales/', 'https://cleantechnica.com/2018/11/11/can-you-still-get-the-7500-tax-credit-on-a-tesla-model-3-maybe-its-complicated/']

【讨论】：

您拥有 urls 数组中的所有 URL，您可以按照自己的方式使用它们。我不确定您所说的“将每个 url 放入列表的单独元素中”是什么意思？