在 python 中使用单词、表达式或模式拆分字符串答案

【问题标题】：Splitting string using word, expression, or pattern in python在 python 中使用单词、表达式或模式拆分字符串
【发布时间】：2021-12-11 18:48:51
【问题描述】：

我正在通过 Web 客户端使用 GET 请求解析信息。我有一个基于该数据的连接字符串，我想根据这种模式拆分字符串：“\r\n”。我基本上希望每一位标题信息都在自己的行中。另外我想排除身体信息。

这是我要拆分的示例字符串的一部分：

'HTTP/1.1 400 Bad Request\\r\\nDate: Tue, 26 Oct 2021 11:26:46 GMT\\r\\nServer:

我有一个解析信息的函数，我尝试使用正则表达式和拆分，但我不断收到错误（我是 python 和网络的新手）。以下是我尝试过的一些示例（webinformation 是要拆分的字符串）：

header = webinformation.splitlines()

for x in range(len(header)):
    print(header[x])

这是我尝试过的正则表达式的一个示例

print(re.split('\\r\\n', webinformation))

我怎样才能在自己的行上打印每一位信息？我不确定这是否是转义字符的问题？

【问题讨论】：

或者更容易使用原始字符串：re.split(r'\\r\\n', webinformation) 以避免双重转义。

标签： python regex split

【解决方案1】：

您有 \r\n 四个字符的行分隔符。

您不需要正则表达式，因为它是固定文本。使用str.split：

text = 'HTTP/1.1 400 Bad Request\\r\\nDate: Tue, 26 Oct 2021 11:26:46 GMT\\r\\nServer:'
for line in text.split(r'\r\n'):
    print(line)

请参阅Python demo。

输出：

HTTP/1.1 400 Bad Request
Date: Tue, 26 Oct 2021 11:26:46 GMT
Server:

【讨论】：

【解决方案2】：

就像这样：

➜  ~ ipython
Python 3.8.10 (default, Jun  2 2021, 10:49:15)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.28.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: s = 'HTTP/1.1 400 Bad Request\\r\\nDate: Tue, 26 Oct 2021 11:26:46 GMT\\r\\nServer:'

In [2]: s.replace('\\r\\n', '\n').splitlines()
Out[2]: ['HTTP/1.1 400 Bad Request', 'Date: Tue, 26 Oct 2021 11:26:46 GMT', 'Server:']

【讨论】：

【解决方案3】：

你可以用 \n 替换空格而不使用正则表达式：

a = 'HTTP/1.1 400 Bad Request\\r\\nDate: Tue, 26 Oct 2021 11:26:46 GMT\\r\\nServer:'
print(a.replace('\\r\\n', '\n'))

输出：

HTTP/1.1 400 Bad Request
Date: Tue, 26 Oct 2021 11:26:46 GMT
Server:

【讨论】：