在python中随机搜索和修改文件答案

【问题标题】：Random searching and modification of a file in python在python中随机搜索和修改文件
【发布时间】：2011-05-25 12:48:20
【问题描述】：

我觉得这很简单，但我对 python 的了解还不够，无法正确地做到这一点。

我有两个文件：

文件的行列出了 ID 号以及是否使用了该 ID。格式为 'id, isUsed'。
包含每个 id 一个规则的规则文件。

所以我想做的是用 id-used 对解析文件，然后根据该信息，我将在第二个文件中找到相应的规则，然后根据 if 注释或取消注释该规则使用规则。

有没有一种简单的方法可以在第二个文件中搜索我正在寻找的规则，而不是每次都逐行搜索？还有，是不是每次改文件都要重新写文件。

这是我到目前为止所拥有的，我真的不知道实现 modifyRulesFile() 的最佳方法是什么：

def editRulesFile(pairFile, ruleFile): 
    pairFd = open(pairFile, 'r')
    ruleFd = open(ruleFile, 'rw')

    for line in pairFd.readLine():
        id,isUsed = line.split(',')

        modifyRulesFile(ruleFd, id, isUsed)

def modifyRulesFile(fd, id, isUsed):
    for line in fd.readLine():
        # Find line with id in it and add a comment or remove comment based on isUsed

【问题讨论】：

标签： python string replace find

【解决方案1】：

我建议您将规则文件读入字典（id -> 规则）。然后，当您阅读配置文件时，写出相应的规则（如果需要，包括注释）。

一些伪代码：

rules = {}
for id, rule in read_rules_file():
    rules[id] = rule
for id, isUsed in read_pairs_file():
    if isUsed:
       write_rule(id, rules[id])
    else:
       write_commented_rule(id, rules[id])

这样，您将只通过每个文件一次。如果规则文件变得很长，您可能会耗尽内存，但是，这通常需要很长时间才能发生！

您可以使用生成器来避免一次将所有对保存在内存中：

def read_pairs_file():
   pairFd = open(pairFile, 'r')
   for line in pairFd.readLines():
      id, isUsed = line.split(',')
      yield (id, isUsed)
   pairFd.Close()

【讨论】：

+1：不要为这么小的东西搞乱文件级 I/O。只需将其全部放入内存并尽可能简单直接地处理对象即可。
规则文件大约 14MB，是否太大而无法将其全部读入内存，或者这有关系吗？我刚刚看到您对生成器的评论，非常感谢！
不。您可以轻松吸满 14MB。想都别想:)
另外，您可以在这里切换逻辑并将配对存储在字典中（id -> isUsed）并循环遍历规则。将输出写入临时文件，完成后，将原始文件替换为临时文件。
@Kevin S. “14MB”？你的电脑有多少内存？它是否接近 14MB？它是 100 倍大吗？（1.4GB）还是更大？

【解决方案2】：

我不知道为什么我以前没有想到这一点，但是还有另一种方法可以做到这一点。

首先，您将哪些规则应该使用（或不使用）读入内存，我将其存储到字典中。

def readRulesIntoMemory(fileName):
    rules = {}

    # Open csv file with rule id, isUsed pairs
    fd = open(fileName, 'r')
    if fd:
        for line in fd.readlines():
            id,isUsed = line.split(',')
            rules[id] = isUsed

然后在读取另一个文件中的当前规则列表时，将您的更改写入一个临时文件。

def createTemporaryRulesFile(temporaryFileName, rulesFileName, rules):
    # Open current rules file for reading.
    rulesFd = open(rulesFileName, 'r')
    if not rulesFd:
        return False

    # Open temporary file for writing
    tempFd  = open(temporaryFileName, 'w')
    if not tempFd:
        return False

    # Iterate through each current rule.
    for line in rulesFd.readlines():
        id = getIdFromLine(line)

        isCommented = True # Default to commenting out rule
        # If rule's id is was in csv file from earlier, save whether we comment
        # the line or not.
        if id in rules:
            isCommented = rules[id]

        if isCommented:
            writeCommentedLine(tempFd, line)
        else:
            writeUncommentedLine(tempFd, line)

    return True

现在我们可以根据需要将新的临时文件复制到原始文件上。

【讨论】：