【问题标题】:Parsing xml file in python which contains multifasta BLAST result在python中解析包含multifasta BLAST结果的xml文件
【发布时间】:2016-04-15 10:02:16
【问题描述】:

我正在尝试解析包含 multifasta BLAST 结果的 xml 文件 - 这是link - 它的大小约为 400kB。程序应返回四个序列名称。每个下一个结果都应该在(包含最佳对齐)“ n ”之后,其中 n = 1,2,3,...

像这样:

< Iteration_iter-num >1< /Iteration_iter-num >

****Alignment****
sequence: gi|171864|gb|AAC04946.1| Yal011wp [Saccharomyces cerevisiae]

< Iteration_iter-num >2< /Iteration_iter-num >

****Alignment****
sequence: gi|330443384|ref|NP_009392.2| 

< Iteration_iter-num >3< /Iteration_iter-num >

****Alignment****
sequence: gi|6319310|ref|NP_009393.1|

< Iteration_iter-num >4< /Iteration_iter-num >

****Alignment****
sequence: gi|6319312|ref|NP_009395.1|

但结果我的程序返回了这个:

<Iteration_iter-num>1</Iteration_iter-num>
****Alignment****
sequence: gi|171864|gb|AAC04946.1| Yal011wp [Saccharomyces cerevisiae]

<Iteration_iter-num>2</Iteration_iter-num>
****Alignment****
sequence: gi|171864|gb|AAC04946.1| Yal011wp [Saccharomyces cerevisiae]

<Iteration_iter-num>3</Iteration_iter-num>
****Alignment****
sequence: gi|171864|gb|AAC04946.1| Yal011wp [Saccharomyces cerevisiae]

<Iteration_iter-num>4</Iteration_iter-num>
****Alignment****
sequence: gi|171864|gb|AAC04946.1| Yal011wp [Saccharomyces cerevisiae]

如何从这个 xml 文件中获取另一个 BLASTA 结果?

这是我的代码:

from Bio.Blast import NCBIXML
from bs4 import BeautifulSoup

result = open ("BLAST_left.xml", "r")
records = NCBIXML.parse(result)
item = next(records)

file = open("BLAST_left.xml")
page = file.read()
soup = BeautifulSoup(page, "xml")
num_xml_array = soup.find_all('Iteration_iter-num')
i = 0
for records in records:
    for itemm in num_xml_array:
        print (itemm)
        for alignment in item.alignments:
            for hsp in alignment.hsps:
                print("\n\n****Alignment****")
                print("sequence:", alignment.title)
            break
        itemm = num_xml_array[i+1]
    break

//我知道我的英语不是很完美,但我真的不知道该怎么做,也没有人要求,所以我选择了你:)

【问题讨论】:

    标签: python xml bioinformatics biopython blast


    【解决方案1】:

    我认为 Biopython 是解析 XML 的更好选择,无需使用 BeautifulSoup:

    from Bio.Blast import NCBIXML
    
    
    result = open("BLAST_left.xml", "r")
    records = NCBIXML.parse(result)
    
    for i, record in enumerate(records):
        for align in record.alignments:
            print("Iteration {}".format(i))
            print(align.hit_id)
            break  # Breaking here gives you only the best HSP.
    

    【讨论】:

      【解决方案2】:

      你可以使用python内置模块xml.etree.ElementTree来解析xml

      import xml.etree.ElementTree as ET
      tree = ET.parse('BLAST_left.xml')
      doc = tree.getroot()
      for item in doc.find('BlastOutput_iterations'):
          print '< Iteration_iter-num >{0}< /Iteration_iter-num >'.format(item.find('Iteration_iter-num').text)
          print '** ** Alignment ** **'
          print 'sequence:{0}|{1}'.format(item.find('Iteration_hits/Hit/Hit_id').text, item.find('Iteration_hits/Hit/Hit_def').text)
      

      【讨论】:

        猜你喜欢
        • 2015-02-24
        • 1970-01-01
        • 2011-03-15
        • 2023-04-03
        • 1970-01-01
        • 2014-03-12
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        相关资源
        最近更新 更多