【问题标题】:Retrieve DNA sequence using a gene identifier of a protein使用蛋白质的基因标识符检索 DNA 序列
【发布时间】:2014-11-04 16:18:17
【问题描述】:

我正在使用 Biopython 尝试检索与我有 GI(71743840) 的蛋白质对应的 DNA 序列,从 NCBI 页面这很容易,我只需要查找 refseq。我的问题出现在 python 中,使用 ncbi fetch 实用程序进行编码时,我找不到任何方法来检索任何可以帮助我进入 DNA 的字段。

handle = Entrez.efetch(db="nucleotide", id=blast_record.alignments[0].hit_id, rettype="gb", retmode="text")
seq_record=SeqIO.read(handle,"gb")

seq_record.features 中有很多信息,但必须有一种更简单明了的方法来做到这一点,任何帮助将不胜感激。 谢谢!

【问题讨论】:

  • blast_record.alignments[0].hit_id的值是多少
  • 啊,对不起,在这种情况下,它是蛋白质“71743840”的GI

标签: python bioinformatics biopython ncbi


【解决方案1】:

您可以尝试访问 SeqRecord 的注解:

seq_record=SeqIO.read(handle,"gb")
nucleotide_accession = seq_record.annotations["db_source"]

在您的情况下,nucleotide_accession 是“REFSEQ:加入 NM_000673.4”

现在看看你是否可以解析这些注释。只有这个测试用例:

nucl_id = nucleotide_accession.split()[-1]

handle = Entrez.efetch(db="nucleotide",
                       id=nucl_id,
                       rettype="gb",
                       retmode="text")
seq_record = SeqIO.read(handle, "gb")

【讨论】:

    【解决方案2】:

    您可以利用Entrez.elink,请求与核苷酸序列的UID对应的蛋白质序列的UID:

    from Bio import Entrez
    from Bio import SeqIO
    email = 'seb@free.fr'
    term = 'NM_207618.2' #fro example, accession/version
    
    ### first step, we search for the nucleotide sequence of interest
    h_search = Entrez.esearch(
            db='nucleotide', email=email, term=term)
    record = Entrez.read(h_search)
    h_search.close()
    
    ### second step, we fetch the UID of that nt sequence
    handle_nt = Entrez.efetch(
            db='nucleotide', email=email, 
            id=record['IdList'][0], rettype='fasta') # here is the UID
    
    ### third and most important, we 'link' the UID of the nucleotide
    # sequence to the corresponding protein from the appropriate database
    results = Entrez.read(Entrez.elink(
            dbfrom='nucleotide', linkname='nucleotide_protein',
            email=email, id=record['IdList'][0]))
    
    ### last, we fetch the amino acid sequence
    handle_aa = Entrez.efetch(
            db='protein', email=email, 
            id=results[0]['LinkSetDb'][0]['Link'][0]['Id'], # here is the key...
            rettype='fasta')
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-02-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2014-05-13
      • 2012-06-27
      • 2013-01-16
      • 1970-01-01
      相关资源
      最近更新 更多