【问题标题】:How do I find the nucleotide sequence of a protein using Biopython?如何使用 Biopython 找到蛋白质的核苷酸序列?
【发布时间】:2020-05-01 09:41:11
【问题描述】:

我有一些蛋白质,我想找到它们对应的核苷酸序列。我也有发现蛋白质的基因组。在基因组中,我找到了该蛋白质的相应基因 ID。但是,我在获取带有 Gene ID 的核苷酸序列时遇到了麻烦。我尝试过使用 Entrez Efetch:

Entrez.email = "dddd@gmail.com"
with open("genome.gb", "w") as out_handle:
    request = Entrez.efetch(db="gene", id="2703488", rettype="gb", retmode="text")
    out_handle.write(request.read())
    request.close()

但这只会返回以下内容:

1. G
tail component [Escherichia virus Lambda]
Other Aliases: lambdap14
Other Designations: tail component
Annotation:  NC_001416.1 (9711..10133)
ID: 2703488

是否可以使用 Efetch 获取实际的核苷酸序列?提前致谢!

【问题讨论】:

    标签: python bioinformatics biopython ncbi rentrez


    【解决方案1】:

    您可以使用Annotation: 行中的信息从 NCBI 核苷酸中获取序列:

    >>> from Bio import Entrez, SeqIO
    >>> Entrez.email = ''
    >>> request = Entrez.efetch(db="nuccore", id="NC_001416.1", rettype="fasta", seq_start="9711", seq_stop="10133")
    >>> seq_record = SeqIO.read(request, "fasta")
    >>> seq_record
    SeqRecord(seq=Seq('ATGTTCCTGAAAACCGAATCATTTGAACATAACGGTGTGACCGTCACGCTTTCT...TGA', SingleLetterAlphabet()), id='NC_001416.1:9711-10133', name='NC_001416.1:9711-10133', description='NC_001416.1:9711-10133 Enterobacteria phage lambda, complete genome', dbxrefs=[])
    >>> print(seq_record.seq)
    ATGTTCCTGAAAACCGAATCATTTGAACATAACGGTGTGACCGTCACGCTTTCTGAACTGTCAGCCCTGCAGCGCATTGAGCATCTCGCCCTGATGAAACGGCAGGCAGAACAGGCGGAGTCAGACAGCAACCGGAAGTTTACTGTGGAAGACGCCATCAGAACCGGCGCGTTTCTGGTGGCGATGTCCCTGTGGCATAACCATCCGCAGAAGACGCAGATGCCGTCCATGAATGAAGCCGTTAAACAGATTGAGCAGGAAGTGCTTACCACCTGGCCCACGGAGGCAATTTCTCATGCTGAAAACGTGGTGTACCGGCTGTCTGGTATGTATGAGTTTGTGGTGAATAATGCCCCTGAACAGACAGAGGACGCCGGGCCCGCAGAGCCTGTTTCTGCGGGAAAGTGTTCGACGGTGAGCTGA
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2014-03-07
      • 2014-05-13
      • 1970-01-01
      • 2011-10-28
      • 1970-01-01
      • 1970-01-01
      • 2018-09-12
      • 1970-01-01
      相关资源
      最近更新 更多