从另一个文件中的文件中搜索单词答案

【问题标题】：Searching words from a file in another file从另一个文件中的文件中搜索单词
【发布时间】：2013-05-13 12:13:58
【问题描述】：

我有 2 个文件：

access.log.13：来自网络服务器的简单访问日志。
bots.txt ：包含蜘蛛和爬虫的名称，每一个都在不同的行中，例如：googlebot mj12bot baidu etc etc

我想创建第三个文件“hits.txt”，其中包含“access.log.13”中的所有行，其中包含文件“spiders.txt”中的任何单词

这是我的小弗兰肯斯坦：

file_working = file("hits.txt", "wt")

file_1_logs = open("access.log.13", "r")
file_2_bots = open("bots.txt", "r")
file_3_hits = open("hits.txt", "a")

list_1 = arxiu_1_logs.readlines()
list_2 = arxiu_2_bots.readlines()

file_3_hits.write("Lines with bots: \n \n")

for i in list_2:
    for j in list_1:
     if i in j:          
         file_3_hits.write(j)

arxiu_1_logs.close()
arxiu_2_bots.close()

它不起作用，因为我只有在 bots.txt 中的行与 access.log.13 中的任何行完全相同时才会被点击。谢谢

【问题讨论】：

这立即让我觉得这是一个更适合 shell 脚本的任务。因此，如果您有兴趣：grep -Fe "$(cat bots.txt|tr " " \\n| tr -d \\r| tr -s \\n)" access.log.13 > hits.txt 执行此功能...。为了解释该命令行，它使用 tr 转换空格 -> 换行符，删除回车，最后将多个换行符的任何运行压缩为单个换行符。结果作为表达式提供给 grep -F（它期望每行一个模式）... TL;DR: "grep -F can do this already"
非常感谢！我会将这个 grep 示例添加到我的印象笔记中。但我想用python学习同样的东西。再次感谢:)

标签： python string search

【解决方案1】：

你可以用更蟒蛇的方式来做：

with open('spiders.txt') as fh:
    words = set(re.split(r'[ \n\r]+', fh.read())) # set of searched words

with open('access.log.13') as file_in, \
     open('hits.txt', 'w') as file_out:
    for line in file_in:
        if any(word in line for word in words): # look for any of the words
            file_out.write(line)

或者你可以使用更好的理解：

with open(...) as file_in, open (...) as file_out: # same as previously
    good_lines = (line for line in file_in if any(word in line for word in words))
    for good_line in good_lines:
        file_out.write(good_line)

【讨论】：

Jakub M. 你的第一个选项效果很好，我理解代码（很好）。现在我正在使用第二个选项:)

【解决方案2】：

将 if 替换为：

if j.find(i) != -1

【讨论】：

我用你的建议替换了行：if i in j: if j.find(i) != -1 但是终端用“SyntaxError: invalid syntax”指向数字“1” !
@Quendi ':' 在行尾丢失。