Python字符串，找到特定的单词，然后复制它后面的单词答案

【问题标题】：Python string, find specific word, then copy the word after itPython字符串，找到特定的单词，然后复制它后面的单词
【发布时间】：2013-08-03 01:31:23
【问题描述】：

我在执行此类操作时遇到问题，假设我们有一个字符串

teststring = "This is a test of number, number: 525, number: 585, number2: 559"

我想将 525 和 585 存储到一个列表中，我该怎么做？

我以一种非常愚蠢的方式做到了，可以，但必须有更好的方法

teststring = teststring.split()
found = False
    for word in teststring:
        if found:
            templist.append(word)
            found = False
        if word is "number:":
            found = True

有正则表达式的解决方案吗？

跟进：如果我想存储 525、585 和 559 怎么办？

【问题讨论】：

为什么要在这里使用正则表达式？一个简单的列表理解会更快并且更易读
@tobyodavies 你需要几个步骤，比如拆分和解析数字等吗？

标签： python string parsing

【解决方案1】：

我建议：

teststring = "This is a test of number, number: 525, number: 585, number2: 559"
# The following does: "This is a test of number, number: 525, number: 585, number2: 559" -> ["525, number", "585, number2", "559"]
a = teststring.split(': ')[1:]
# The following does: ["525, number", "585, number2", "559"] -> ["525", " number", "585", " number2", "559"]
b = [i.split(',') for i in a]
# The following does: [["525", " number"], ["585", " number2"], ["559"]] -> ["525", "585", "559"]
c = [i[0] for i in b]
>>> c
['525', '585', '559']

【讨论】：

这是做什么的？ [1:]
@Guagua 如果：a = [a, b, c, d]; a[1:] = [b, c, d]

【解决方案2】：

它不是世界上最高效的代码，但它仍然可能比正则表达式更好：

tokens = teststring.split()
numlist = [val for key, val in zip(tokens, tokens[1:]) if key == 'number:']

为您的后续和更一般的查询：

def find_next_tokens(teststring, test):
    tokens = teststring.split()
    return [val for key, val in zip(tokens, tokens[1:]) if test(key)]

可以这样称呼：

find_next_tokens(teststring, lambda s: s.startswith('number') and s.endswith(':'))

如果要搜索的键来自用户输入，这将有所帮助：

find_next_tokens(teststring, lambda s: s in valid_keys)

【讨论】：

您能解释一下为什么该解决方案比正则表达式更好吗？效率还是其他？
正则表达式很难调试，如果你不需要，我建议不要使用。我发现这更容易分辨正在发生的事情，而且它只是一个正则表达式 + 解释匹配数据的代码。如果您想从用户那里获取密钥列表并使用正则表达式对其进行解析，那么祝您好运。

【解决方案3】：

使用re模块：

>>> re.findall(r'number\d*: (\d+)',teststring)
['525', '585', '559']

\d 是任何数字 [0-9]
* 表示从 0 到无限次
() 表示要捕获的内容
+ 表示从 1 到无限次

如果您需要将生成的字符串转换为ints，请使用map：

>>> map(int, ['525', '585', '559'])
[525, 585, 559]

或

list comprehension:

>>> [int(s) for s in ['525', '585', '559']]
[525, 585, 559]

【讨论】：

您好，ovgolovin，谢谢您的回答。非常清楚。 \d+ 意味着它可以捕获任意数量的数字，对吗？
@Guagua 是的，+ 表示从 1 到无穷大。如果您需要 3 个，请使用 {3} 而不是 +。
我知道这是个愚蠢的问题，re.findall(r' r 是干什么用的？
@Guagua 请参阅本文顶部的第二和第三段docs.python.org/2/library/re.html，了解raw strings 是什么以及为什么需要它们。
是的，是的，我不久前读过它并忘记了它..非常有帮助，谢谢

【解决方案4】：

你可以试试这个：

import re
[int(x) for x in re.findall(r' \d+', teststring)]

这会给你：

[525, 585, 559]

【讨论】：

这只回答了一半问题
除非他搜索所有 3 位数字，然后用逗号回答最初的问题。

【解决方案5】：

您可以使用正则表达式组来完成此操作。下面是一些示例代码：

import re
teststring = "This is a test of number, number: 525, number: 585, number2: 559"
groups = re.findall(r"number2?: (\d{3})", teststring)

groups 然后包含数字。此语法使用正则表达式组。

【讨论】：

但只需要 525 和 585。
感谢 Vacation，能否解释一下 findall(r"number2?: (\d{3})"
@Guagua number 是不言自明的，2? 匹配零次或一次出现的2，冒号和空格匹配冒号和空格，(\d{3}) 匹配并捕获三位数字。