【发布时间】:2016-04-15 10:02:16
【问题描述】:
我正在尝试解析包含 multifasta BLAST 结果的 xml 文件 - 这是link - 它的大小约为 400kB。程序应返回四个序列名称。每个下一个结果都应该在(包含最佳对齐)“ n ”之后,其中 n = 1,2,3,...
像这样:
< Iteration_iter-num >1< /Iteration_iter-num >
****Alignment****
sequence: gi|171864|gb|AAC04946.1| Yal011wp [Saccharomyces cerevisiae]
< Iteration_iter-num >2< /Iteration_iter-num >
****Alignment****
sequence: gi|330443384|ref|NP_009392.2|
< Iteration_iter-num >3< /Iteration_iter-num >
****Alignment****
sequence: gi|6319310|ref|NP_009393.1|
< Iteration_iter-num >4< /Iteration_iter-num >
****Alignment****
sequence: gi|6319312|ref|NP_009395.1|
但结果我的程序返回了这个:
<Iteration_iter-num>1</Iteration_iter-num>
****Alignment****
sequence: gi|171864|gb|AAC04946.1| Yal011wp [Saccharomyces cerevisiae]
<Iteration_iter-num>2</Iteration_iter-num>
****Alignment****
sequence: gi|171864|gb|AAC04946.1| Yal011wp [Saccharomyces cerevisiae]
<Iteration_iter-num>3</Iteration_iter-num>
****Alignment****
sequence: gi|171864|gb|AAC04946.1| Yal011wp [Saccharomyces cerevisiae]
<Iteration_iter-num>4</Iteration_iter-num>
****Alignment****
sequence: gi|171864|gb|AAC04946.1| Yal011wp [Saccharomyces cerevisiae]
如何从这个 xml 文件中获取另一个 BLASTA 结果?
这是我的代码:
from Bio.Blast import NCBIXML
from bs4 import BeautifulSoup
result = open ("BLAST_left.xml", "r")
records = NCBIXML.parse(result)
item = next(records)
file = open("BLAST_left.xml")
page = file.read()
soup = BeautifulSoup(page, "xml")
num_xml_array = soup.find_all('Iteration_iter-num')
i = 0
for records in records:
for itemm in num_xml_array:
print (itemm)
for alignment in item.alignments:
for hsp in alignment.hsps:
print("\n\n****Alignment****")
print("sequence:", alignment.title)
break
itemm = num_xml_array[i+1]
break
//我知道我的英语不是很完美,但我真的不知道该怎么做,也没有人要求,所以我选择了你:)
【问题讨论】:
标签: python xml bioinformatics biopython blast