【问题标题】:How do I calculate percentage amino acid composition of sequences contained in a large FASTA file如何计算大型 FASTA 文件中包含的序列的氨基酸组成百分比
【发布时间】:2019-12-08 00:50:50
【问题描述】:

我想计算 FASTA 文件中分别包含的每个序列的氨基酸组成,但我很难做到这一点。我知道我可以使用下面的代码来做到这一点,但这涉及到我分别输入每个序列,而不是将 FASTA 文件作为一个整体并以这种方式计算。

from Bio.SeqUtils.ProtParam import ProteinAnalysis 
X = ProteinAnalysis("MAEGEITTFTALTEKFNLPPGNYKKPKLLYCSNGGHFLRILPDGTVDGT" 
                "RDRSDQHIQLQLSAESVGEVYIKSTETGQYLAMDTSGLLYGSQTPSEEC" 
                "LFLERLEENHYNTYTSKKHAEKNWFVGLKKNGSCKRGPRTHYGQKAILF" 
                "LPLPV") 
print(X.count_amino_acids()['A']) 
print(X.count_amino_acids()['E']) 
print("%0.2f" % X.get_amino_acids_percent()['K']) 
print("%0.2f" % X.get_amino_acids_percent()['L']) 
print("%0.2f" % X.molecular_weight()) 
print("%0.2f" % X.aromaticity()) 
print("%0.2f" % X.instability_index()) 
print("%0.2f" % X.isoelectric_point()) 
sec_struc = X.secondary_structure_fraction() 
print("%0.2f" % sec_struc[0]) 
epsilon_prot = X.molar_extinction_coefficient()  
print(epsilon_prot[0])   
print(epsilon_prot[1])  

【问题讨论】:

    标签: python bioinformatics biopython fasta


    【解决方案1】:

    您只需要使用SeqIO.parse() 读取序列的FASTA 文件:

    from Bio import SeqIO
    from Bio.SeqUtils.ProtParam import ProteinAnalysis
    
    for record in SeqIO.parse('myfasta.fa', 'fasta'):
        X = ProteinAnalysis(str(record.seq))
        print('\n### Results for record: {} ###'.format(record.id))
        print(X.count_amino_acids()['A']) 
        print(X.count_amino_acids()['E']) 
        print("%0.2f" % X.get_amino_acids_percent()['K']) 
        print("%0.2f" % X.get_amino_acids_percent()['L']) 
        print("%0.2f" % X.molecular_weight()) 
        print("%0.2f" % X.aromaticity()) 
        print("%0.2f" % X.instability_index()) 
        print("%0.2f" % X.isoelectric_point()) 
        sec_struc = X.secondary_structure_fraction() 
        print("%0.2f" % sec_struc[0]) 
        epsilon_prot = X.molar_extinction_coefficient()  
        print(epsilon_prot[0])   
        print(epsilon_prot[1]) 
    

    【讨论】:

      【解决方案2】:

      我认为你想要来自 FastaIO 模块的东西,例如:

      from Bio.SeqUtils.ProtParam import ProteinAnalysis 
      from Bio.SeqIO import FastaIO
      
      with open('myfile.fasta') as fd:
        for name, sequence in FastaIO.SimpleFastaParser(fd):
           X = ProteinAnalysis(sequence)
           print(name, X.count_amino_acids()['A']) 
      

      以及你想要计算的任何东西

      【讨论】:

      • 正如您在链接中所说的“您应该通过 Bio.SeqIO 函数使用此模块。”。但是,这确实具有比使用 SeqIO.parse 更快的优势,尽管灵活性较差
      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-03-28
      • 2014-04-08
      • 2020-07-21
      • 1970-01-01
      • 2022-09-26
      • 1970-01-01
      • 2014-05-13
      相关资源
      最近更新 更多