在 Python 中使用关键字打印句子答案

【问题标题】：Using a keyword to print a sentence in Python在 Python 中使用关键字打印句子
【发布时间】：2019-04-06 20:58:46
【问题描述】：

您好，我正在编写一个 Python 程序，它读取给定的 .txt 文件并查找关键字。在这个程序中，一旦我找到了我的关键字（例如'data'），我想打印出与该词相关联的整个句子。

我已读入我的输入文件并使用split() 方法去除空格、制表符和换行符，并将所有单词放入一个数组中。

这是我到目前为止的代码。

text_file = open("file.txt", "r")
lines = []
lines = text_file.read().split()
keyword = 'data'

for token in lines:
    if token == keyword:
         //I have found my keyword, what methods can I use to
        //print out the words before and after the keyword 
       //I have a feeling I want to use '.' as a marker for sentences
           print(sentence) //prints the entire sentence

file.txt如下

Welcome to SOF! This website securely stores data for the user.

想要的输出：

This website securely stores data for the user.

【问题讨论】：

如果您使用enumerate 存储/循环索引并获取上一个和下一个索引，则可以。但更大的问题是先将 sentences 分开
如果token在一个句子中出现两次，是否应该打印多次？
@MelvinYellow 是的，保证可以在文本文件中找到该词
@Jean-FrançoisFabre 感谢枚举方法！这使得迭代更容易，至于分隔句子，我将使用句点 ('.') 作为标记。我只需要弄清楚如何检测数组中的句点，因为它附加到一个单词。
例如使用word.endswith(".")。或正则表达式来检测标点符号

标签： python arrays file

【解决方案1】：

我们可以只在代表行尾的字符上拆分文本，然后遍历这些行并打印包含我们关键字的那些。

要在多个字符上分割文本，例如行尾可以用! ? .标记，我们可以使用正则表达式：

import re

keyword = "data"
line_end_chars = "!", "?", "."
example = "Welcome to SOF! This website securely stores data for the user?"
regexPattern = '|'.join(map(re.escape, line_end_chars))
line_list = re.split(regexPattern, example)

# line_list looks like this:
# ['Welcome to SOF', ' This website securely stores data for the user', '']

# Now we just need to see which lines have our keyword
for line in line_list:
    if keyword in line:
        print(line)

但请记住：if keyword in line: 匹配一系列字符，不一定是整个单词 - 例如，'data' in 'datamine' 是真的。如果你只想匹配整个单词，你应该使用正则表达式： source explanation with example

Source for regex delimiters

【讨论】：

小心 if keyword in line 也适用于子字符串，而不是整个单词。

【解决方案2】：

我的方法类似于Alberto Poljak，但更明确一些。

这样做的动机是意识到没有必要对单词进行拆分 - Python 的 in 运算符会很高兴地在句子中找到一个单词。必要的是句子的拆分。不幸的是，句子可以以.、? 或! 结尾，而Python 的split 函数不允许使用多个分隔符。所以我们得弄点复杂的，使用re。

re 要求我们在每个分隔符之间放置一个| 并转义其中一些，因为默认情况下. 和? 都有特殊含义。 Alberto 的解决方案使用re 自己来完成这一切，这绝对是要走的路。但如果你是re 的新手，我的硬编码版本可能会更清晰。

我做的另一个补充是将每个句子的尾随定界符放回它所属的句子上。为此，我将分隔符包装在() 中，它将它们捕获到输出中。然后我用zip 让他们回到他们来自的句子上。 0::2 和 1::2 切片将采用每个偶数索引（句子）并将它们与每个奇数索引（分隔符）连接起来。取消注释 print 语句以查看发生了什么。

import re

lines = "Welcome to SOF! This website securely stores data for the user. Another sentence."
keyword = "data"

sentences = re.split('(\.|!|\?)', lines)

sentences_terminated = [a + b for a,b in zip(sentences[0::2], sentences[1::2])]

# print(sentences_terminated)

for sentence in sentences_terminated:
    if keyword in sentence:
        print(sentence)
        break

输出：

 This website securely stores data for the user.

【讨论】：

投了赞成票，因为当我刚刚粘贴源代码时，您对我的部分答案的解释比我好。

【解决方案3】：

此解决方案使用一个相当简单的正则表达式来查找句子中的关键字，其中包含可能出现在或不在其前后的词，以及最后一个句点字符。它适用于空格，它只是re.search() 的一次执行。

import re

text_file = open("file.txt", "r")
text = text_file.read()

keyword = 'data'

match = re.search("\s?(\w+\s)*" + keyword + "\s?(\w+\s?)*.", text)
print(match.group().strip())

【讨论】：

【解决方案4】：

另一种解决方案：

def check_for_stop_punctuation(token):
    stop_punctuation = ['.', '?', '!']
    for i in range(len(stop_punctuation)):
        if token.find(stop_punctuation[i]) > -1:
            return True
    return False

text_file = open("file.txt", "r")
lines = []
lines = text_file.read().split()
keyword = 'data'

sentence = []
stop_punctuation = ['.', '?', '!']

i = 0
while i < len(lines):
    token = lines[i]
    sentence.append(token)
    if token == keyword:
        found_stop_punctuation = check_for_stop_punctuation(token)
        while not found_stop_punctuation:
            i += 1
            token = lines[i]
            sentence.append(token)
            found_stop_punctuation = check_for_stop_punctuation(token)
        print(sentence)
        sentence = []
    elif check_for_stop_punctuation(token):
        sentence = []
    i += 1

【讨论】：