Python - 使用 seek 写入文件答案

【问题标题】：Python - Writing to file using seekPython - 使用 seek 写入文件
【发布时间】：2017-07-25 21:57:09
【问题描述】：

我是 Python 的初学者，我正在尝试各种方法来完成反向互补 DNA 或 RNA 序列以学习一些字符串函数等的简单任务。我的最新方法几乎可以工作，但对于我的一个小刺激找不到答案，可能是因为我正在使用一些我不正确理解的东西。我的函数旨在编写一个空白文件（这有效！），然后打开一个包含序列的文件，一次循环一个字符，将其反向补码写入新文件。代码如下：

def func_rev_seq(in_path,out_path):
"""
Read file one character at a time and retrun the reverse complement of each nucleotide to a new file
"""
#  Write a blank file (out_path)
fb = open(out_path,"w")
fb.write("")
fb.close()
#  Dictionary where the key is the nucleotide and the value is its reverse complement
base = {"A":"T", "C":"G", "G":"C", "T":"A", "a":"t", "c":"g", "g":"c", "t":"a", "k":"m", "m":"k", "y":"r", "r":"y", "b":"v", "v":"b", "d":"h", "h":"d", "K":"M", "M":"K", "Y":"R", "R":"Y", "B":"V", "V":"B", "D":"H", "H":"D", "U":"A", "u":"a"} 
#  Open the source file (in_path) as fi
fi=open(in_path,"r")
i = fi.read(1)
#  Loop through the source file one character at a time and write the reverse complement to the output file
while i != "":
    i = fi.read(1)
    if i in base:
        b = base[i]   
    else:
        b = i
    with open(out_path, 'r+') as fo:
        body = fo.read()
        fo.seek(0, 0)
        fo.write(b + body)        
fi.close()
fo.close()

问题是当我运行该函数时，输出文件中的字符串首先被单个字符截断，其次是在我不想要的空行下方。 screen shot of input and output file examples 据我了解，带有 (0, 0) 的 seek 函数应该是指文件的开头，但我可能误解了。非常感谢任何帮助，谢谢！

【问题讨论】：

顺便说一下，我的代码已经正确缩进，但在这里没有正确呈现，也许我也做错了！
空行下面的字符是什么？
它是原始序列中最后一个核苷酸的反向互补。因此，例如，如果原始序列是“AACCTCAGC”，那么它将是“G”。

标签： python string seek

【解决方案1】：

当您输入i = fi.read(1) 时，i 等于文件中的第一个字符，但在while 循环的开头，您使用相同的语句将第二个字符分配给i，而没有执行任何操作第一个字符。如果您想遍历文件中的每个字符而不会出现该问题，最好使用for 循环。反向逐个字符地迭代有点挑战性，但这是可行的：

def nucleo_complement(ifilename, ofilename):
    """Reads a file one character at a time and returns the reverse
    complement of each nucleotide."""
    complements = {'A': 'T', 'T': 'A', 'C': 'G', 'G': 'C'}
    ifile = open(ifilename)
    ofile = open(ofilename, 'w')
    for pos in range(ifile.seek(0, 2) + 1, 0, -1):
        ifile.seek(pos - 1)
        char = ifile.read(1)
        ofile.write(complements.get(char.upper(), char))
    ifile.close()
    ofile.close()

seek 返回新文件位置，seek(0, 2) 转到文件中的最后一个字符。每当您调用read(1) 时，文件中的位置会前移一个字符，因此我必须让pos 最初等于最后一个字符的位置加一个，然后在第二个字符而不是第一个字符处结束我的循环。对于每次迭代，我用ifile.seek(pos - 1') 返回一个字符，然后读取下一个（原始）字符。作为一个初学者，这个例子可能有点多，所以如果你有任何问题，请随时提问。实际上，您需要考虑的只是 for 循环中的前两个语句，以及我同时打开两个文件这一事实。

【讨论】：

感谢 Issac，这是一个非常有用的解释。我会试一试，让你知道我的进展如何。谢谢。
感谢 Issac，我尝试了您的建议，只需稍加修改即可使用。照原样运行，它确实解决了截断问题，但输出文件顶部还有一行。我尝试更改：'for pos in range(ifile.seek(0, 2) + 1, 0, -1):' to 'for pos in range(ifile.seek(0, 2) , 0, -1): ' 但这似乎没有做任何事情，所以我更进一步并将其更改为 'for pos in range(ifile.seek(0, 2) - 1, 0, -1):'.Viola!它奏效了。
很高兴为您提供帮助。这是有道理的；最后一个字符必须是换行符。没有考虑到。

【解决方案2】：

感谢 Issac，这是有效的代码。它解决了我遇到的两个问题。

def func_rev_seq(in_path,out_path):
    """Read file one character at a time and retrun the reverse complement of each nucleotide to a new file"""

    #  Write a blank file (out_path)
    fb = open(out_path,"w")
    fb.write("")
    fb.close()
    #  Dictionary where the key is the nucleotide and the value is its reverse complement
    base = {"A":"T", "C":"G", "G":"C", "T":"A", "a":"t", "c":"g", "g":"c", "t":"a", "k":"m", "m":"k", "y":"r", "r":"y", "b":"v", "v":"b", "d":"h", "h":"d", "K":"M", "M":"K", "Y":"R", "R":"Y", "B":"V", "V":"B", "D":"H", "H":"D", "U":"A", "u":"a"} 
    fi= open(in_path)
    fo = open(out_path, 'w')

    for pos in range(fi.seek(0, 2) - 1,  0, -1):
        fi.seek(pos - 1)
        b = fi.read(1)
        if b in base:
            fo.write(base.get(b, b))
        else:
            fo.write(b)
    fi.close()
    fo.close()

【讨论】：