【问题标题】:Re-numbering residues in PDB file with biopython使用 biopython 重新编号 PDB 文件中的残基
【发布时间】:2017-06-15 14:12:41
【问题描述】:

我有一个序列比对:

RefSeq     :MXKQRSLPLXQKRTKQAISFSASHRIYLQRKFSH .....

Templatepdb:-----------------ISFSASHR------FSHAQADFAG 

我正在尝试编写一个代码,根据 PDB 文件中的这种对齐方式对残基重新编号:

原始 pdb:RES ID= 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 5 ...

新 pdb:RES ID = 18 18 18 19 19 19 19 19 20 20 20 21 21 22 23 24 25 31 31 31 31 32 32 33 34 35 36 ...

如果对齐只有在对齐开始时有间隙,很容易弄清楚。只计算间隙(“-”)并将间隙总和添加到residual.id =“”“间隙总和”“”

但是,如果序列中间有间隙,我找不到方法。

你有什么建议吗?

【问题讨论】:

    标签: bioinformatics biopython pdb pdb-files sequence-alignment


    【解决方案1】:

    如果我理解正确的话,

    您的输入是对齐方式:

    '-----------------ISFSASHR------FSHAQADFAG'
    

    以及残基编号列表:

    [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 18, 18, 18, 18]
    

    而你的输出是由残基之前的间隙数移动的残基数:

    [18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 25, 25, 25, 25, 32, 32, 32, 33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 40, 41, 41, 41, 41]
    

    下面是演示它的代码。有很多方法可以计算输出。

    我这样做的方法是保留一个字典shift_dict,其中键作为原始数字,值作为移位数字。

    import itertools
    import random
    
    
    def random_residue_number(sequence):
        nested = [[i + 1] * random.randint(1, 10) for i in range(len(sequence))]
        merged = list(itertools.chain.from_iterable(nested))
        return merged
    
    
    def aligned_residue_number(alignment, original_number):
        gap_shift = 0
        residue_count = 0
        shift_dict = {}
        for residue in alignment:
            if residue == '-':
                gap_shift += 1
            else:
                residue_count += 1
                shift_dict[residue_count] = gap_shift + residue_count
        return [shift_dict[number] for number in original_number]
    
    
    sequence = 'ISFSASHRFSHAQADFAG'
    alignment = '-----------------ISFSASHR------FSHAQADFAG'
    original_number = random_residue_number(sequence)
    print(original_number)
    # [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 18, 18, 18, 18]
    new_number = aligned_residue_number(alignment, original_number)
    print(new_number)
    # [18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 25, 25, 25, 25, 32, 32, 32, 33, 34, 34, 34, 34, 35, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 38, 38, 38, 38, 39, 39, 39, 39, 39, 39, 39, 39, 39, 39, 40, 41, 41, 41, 41]
    

    【讨论】:

    • 非常感谢,得到这个new_number列表后,您有什么建议可以用.pdb文件中的残基编号替换吗?
    • 看看 Biopython 的Bio.PDB 模块,它可能会对你有所帮助。你也可以编写自己的解析器来根据PDB specification解析ATOM部分。
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2018-11-14
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多