【问题标题】:How to print the first four lines into a file, and the following four lines into a second file, and so on?如何将前四行打印到一个文件中,然后将后四行打印到第二个文件中,依此类推?
【发布时间】:2020-09-10 18:46:16
【问题描述】:

我有一个 fastq 文件,其中包含我的所有序列,这是双端测序的结果。我需要将它们分成两个文件,所有反向序列都在一个文件中,而前向序列在第二个文件中。因此,我需要读取前四行并将它们写入文件“R”,然后读取接下来的四行并将它们写入文件“F”。之后,我需要以相同的方式阅读并保存以下几行。 我想过这样的事情(下),但没有奏效。有什么帮助吗?请。

R = open("R.fastq","w+")
F = open("F.fastq","w+")

x = raw_input('type the name of the file you wanna split: ')   
with open (x, 'rt') as myfile:   
    for line in myfile:
        R.write (line)
        R.write (line)
        R.write (line)
        R.write (line)
        F.write (line)
        F.write (line)
        F.write (line)
        F.write (line)

R.close()
F.close()

【问题讨论】:

    标签: python printing fastq write


    【解决方案1】:

    应该这样做:

    r = [] # List for the lines to be written into R
    f = [] # List for the lines to be written into F
    
    with open('text.txt','r') as myfile: # Open the original file 
        lines = myfile.readlines() # and store each line inside a list called lines
    
    index = 0 # Index of the line
    
    while index <= len(lines)-1:
    
        for n in range(4):
            if index <= len(lines)-1:
                r.append(lines[index]) # Append line to r
                index+=1
    
        for n in range(4):
            if index <= len(lines)-1:
                f.append(lines[index]) # Append line to f
                index+=1
    
    
    with open('file1.txt','w') as R:
        for line in r:
            R.write(line) # Write each line from r into R
    
    with open('file2.txt','w') as F:
        for line in f:
            F.write(line) # Write each line from f into F
    

    【讨论】:

      【解决方案2】:

      我认为这会满足您的需求 — 至少对于我自己创建的测试文件来说似乎是这样。

      它使用 generator function 我命名为 grouper() 将输入文件中的行分成 4 组,然后将它们输出到 2 个输出文件之一。它通过使用内置 enumerate() 函数计算正在处理的组并使用产生模 2 (% 2) 的计数器来选择其中一个或另一个来确定要使用的输出文件。

      from itertools import zip_longest
      
      
      def grouper(n, iterable):
          """ s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ... """
          FILLER = object()  # Value that couldn't be in data.
          for result in zip_longest(*[iter(iterable)]*n, fillvalue=FILLER):
              yield tuple(v for v in result if v is not FILLER)
      
      
      input_filename = 'sequences.txt'
      output_filename1 = 'R.fastq'
      output_filename2 = 'F.fastq'
      
      with open(input_filename) as inp, \
           open(output_filename1, 'w') as outp1, \
           open(output_filename2, 'w') as outp2:
      
          output_files = outp1, outp2
          for i, group in enumerate(grouper(4, inp)):
              outp = output_files[i % 2]
              for line in group:
                  outp.write(line)
      
      print('done')
      

      【讨论】:

        【解决方案3】:

        您的问题是您在两个文件中写入了四次同一行,对于循环中的每次迭代,程序无法确定应将哪一行写入哪个文件。试试这个代码,没有文件我无法测试它,但它的理论应该可以运行。

        这将跟踪它所在的每一行。如果该行是4的倍数,则递增q,如果q为偶数,则写入文件R,如果q为奇数,则写入文件F。

        R = open("R.fastq","w+") # open file R with write permissions
        F = open("F.fastq","w+") #open file q with write permissions
        
        x = raw_input('type the name of the file you wanna split: ')   #input file name
        p = 0 #variable to increment, tracking which line you're at
        q = 0 #variable to track when to switch files
        with open (x, 'rt') as myfile:   #open input file with read permissions
            for line in myfile: # loop through file
                if q%2 == 0: #if q is even
                    R.write (line) #write to file R
                elif q%2 == 1: #if q is odd
                    F.write (line) #write to file F
                p+=1 #increment tracker to next line
                if p%4 == 0: # if line is a multiple of 4
                    q+=1 #increment q to switch files
        
        R.close() #close file R
        F.close() #close file F
        

        【讨论】:

          【解决方案4】:

          这称为“去交织”交织的 FASTQ。如果你用谷歌搜索,你会发现任何数量的预制解决方案,包括BBmap/BBtools 包的reformat 命令。 http://seqanswers.com/forums/showthread.php?t=46174

          【讨论】:

            猜你喜欢
            • 1970-01-01
            • 1970-01-01
            • 2012-11-01
            • 1970-01-01
            • 2022-12-24
            • 1970-01-01
            • 2020-11-30
            • 2015-08-19
            • 1970-01-01
            相关资源
            最近更新 更多