【问题标题】:How to replace pdb atom entries with an altered pdb file that just contains atom entries如何用仅包含原子条目的已更改 pdb 文件替换 pdb atom 条目
【发布时间】:2021-12-08 17:07:52
【问题描述】:

我之前更改了 pdb 文件 6gch 的 chain_id,导致输出如下所示:

原子 1 N CYS G 1 54.142 90.734 71.584 1.00 8.30 N
原子 2 CA CYS G 1 53.264 90.010 72.541 1.00 6.56 C
原子 3 C CYS G 1 53.418 90.566 73.962 1.00 7.21 C

使用代码:

from Bio.PDB import PDBList, PDBIO, PDBParser

pdbl = PDBList()

io = PDBIO()
parser = PDBParser()
pdbl.retrieve_pdb_file('6gch', pdir='.', file_format="pdb")

# pdb6gch.ent is the filename when retrieved by PDBList
structure = parser.get_structure('6gch', 'pdb6gch.ent')

renames = {
    "E": "A",
    "F": "B",
    "G": "C"
}

for model in structure:
    for chain in model:
        old_name = chain.get_id()
        new_name = renames.get(old_name)
        if new_name:
            print(f"renaming chain {old_name} to {new_name}")
            chain.id = new_name
        else:
            print(f"keeping chain name {old_name}")

io.set_structure(structure)
io.save('6gch_renamed.pdb'

我想知道是否可以用原始 6gch pdb 文件中的条目替换已编辑链的 ATOM 条目 1,2 和 3(显示在开头)。

我仍在学习如何编码,因此我们将不胜感激。

6gch pdb 文件 - https://files.rcsb.org/download/6GCH.pdb

【问题讨论】:

  • 我不明白你想要完成什么。将这 3 个条目更改回其原始链将意味着您有不连续的链。这三个主链原子似乎与不同的链共价键合。这有什么帮助?
  • 我正在尝试更改 pdb 链原子条目以扩展我对编码和 pdb 文件格式的理解。这没有任何功能/科学用途,只是我试图在编码方面做得更好,看看什么是可能的,什么是不可能的。
  • 在你的解析循环模型中进入 ATOM - 链 - 残基 - 原子并使用 atom.serial_number == 1(或 2 或 3)
  • auch 无论如何都找不到更改原子链 ID !!! b**** !!!!!!!!!!!!!将它们移动到链 E 将破坏原子序号序列(如果 E 存在或在 E 之后有链)

标签: python biopython pdb-files


【解决方案1】:

正如我在评论中解释的那样,我对这个请求有点困惑。原因是我看不到任何与化学相关的改变链分配的动机。你回答了

我正在尝试更改 pdb 链原子条目以扩展我对编码和 pdb 文件格式的理解。这没有任何功能/科学用途,只是我试图在编码方面做得更好,看看什么是可能的,什么是不可能的

我的回答: 我建议您将 a) 理解 PDB 和 b) 理解编码的目标分开。在一个非常好的近似值中,PDB 文件只是一个原子坐标列表,其中包含一些关于共价原子键的提示。更重要的是对这一切背后的化学有一个体面的理解:什么类型的键是可能的,哪些化学性质可以推导出来。 对于这一部分,biopython 对你帮助不大。更重要的是像 VMD 或 PyMOL 这样的工具,它们可以让您使用 PDB 文件描述的蛋白质来可视化游戏。

像“改变前 3 个原子”这样的实验最好手工完成。 Biopython 试图通过相对容易和结构驱动的修改来提供帮助。

当您遇到需要解决几次(比如说超过 5 次)的问题时,使用 python + biopython 是一种可能(并且很好)的解决方法。这部分的答案是:如果你能想到,那就有可能。

【讨论】:

    【解决方案2】:

    我有一些代码可以回答您的问题,我确信它不是正确的方法,也不是最快的。必须弄清楚 Biopythion 如何管理 PDB 结构对象,但不确定我是否明白。我相信结构的每一部分都是一个对象,但不确定它们之间的关系。我的代码输出一个正确的(就问题而不是格式而言:3 个原子不是 Cys 残基,(我希望))pdb 文件,而 Biopython PDB 对象结构却产生了它的错误:来自 Cys 1-A 的 3 个原子你想被感动没有父母。我猜这是因为我复制 Cys 1-A 并仅在移动原子后才分离原子的方式。在我太长的算法中,看看原子的输出和 printS() 的代码。我需要更多时间来尝试理解它,但我觉得我已经掌握了它,也许你会比我更快地掌握它:

    PS 我从你最近的问题回答开始:How do I change the chain name of a pdb file?

    
    from Bio.PDB import PDBList, PDBIO, PDBParser
    
    from Bio.PDB.Chain import Chain
    
    import warnings
    warnings.filterwarnings('ignore')
    
    
    
    
    def atom_id_total(struct):
        # print(struct)
        id_t = 0
        for model in struct:
            # print(model)
            for chain in model:
                # print(chain)
                for resi in chain:
                    for atom in resi:
                        id_t +=1
                        # print(resi)
        return id_t
    
    
    
    pdbl = PDBList()
    
    io = PDBIO()
    parser = PDBParser()
    pdbl.retrieve_pdb_file('6gch', pdir='.', file_format="pdb")
    
    # pdb6gch.ent is the filename when retrieved by PDBList
    structure = parser.get_structure('6gch', 'pdb6gch.ent')
    
    renames = {
        "E": "A",
        "F": "B",
        "G": "C"
    }
    
    for model in structure:
        for chain in model:
            old_name = chain.get_id()
            new_name = renames.get(old_name)
            if new_name:
                print(f"renaming chain {old_name} to {new_name}")
                chain.id = new_name
            else:
                print(f"keeping chain name {old_name}")
    
    io.set_structure(structure)
    io.save('6gch_renamed.pdb')
    
    structure2 = parser.get_structure('6gch_renamed', '6gch_renamed.pdb')
    
    for model in structure2:
        print('model :',model, model.id, model.full_id)
    
    
    # for model in structure2:
    #     for chain in model:
    #         if chain.id == 'A' :
    #           for residue in chain:
    #                 for ii in residue:
                
    #                       print(atom,atom.serial_number,atom.get_id(),atom.fullname,atom.get_parent())
    
    my_chain = Chain("E")
    
    print('my_chain : ', my_chain, my_chain.id, my_chain.get_parent())
    
    
    
    model_list=[]
    for model in structure2.get_models():
        print(model)
        model_list.append(model)
        
    model_list[0].add(my_chain)
    
    
    print('my_chain : ', my_chain, type(my_chain), my_chain.get_full_id(), my_chain.get_parent())
    
    
    
    print('structure2 :',structure2)
    
    
    list_resi =[]
    list_atom=[]
    for model in structure2:
        for chain in model:
            # if chain.id == 'E':
            print(chain.id)
            # print(dir(chain))
            for residue in chain:
                for atom in residue:
                    if atom.serial_number in [1,2,3]:
                        print(atom,atom.serial_number,atom.get_id(),atom.fullname,atom.get_parent())
                        list_resi.append(residue)
                        list_atom.append(atom)
    
        
    
    list_resi = set(list_resi)                   
    print('list_resi : ',list_resi)
    print('list_atom : ',list_atom)
    
    cnt_resi=0
    for i in list_resi:
        print(i.id, i.full_id)
        copi = i.copy()
        print(copi.id, copi.full_id)
    
        print('copy', copi.id,copi.get_parent())
        setattr(copi, 'id',(copi.id[0], 1 ,(copi.id[2])))
        
        copi.set_parent(my_chain)
        print('copy parent :' , copi.get_parent())
        print('copy', copi.id, copi.get_parent(),'full_id :', copi.get_full_id())
        cnt_resi += 1
        
        
        copi_child = []
        for i in copi.get_list():
            print(i)
            copi_child.append(i.id)
            
        print('copi child : ',copi_child)
        
        for i in copi_child:
            copi.__delitem__(i)
            
            
        print(copi)
        
        
        for i in copi:
            print(i)
    
    cnt_atm = 0
    for i in list_atom:
        # setattr(i, 'serial_number', (atom_id_total(structure2))+300+cnt_atm) #non cambia nulla io.set_structure(structure2) rinomina atomi
        setattr(i, 'serial_number', 1) 
        i.set_parent(copi)
        print('atom i :',i.fullname, i.serial_number, i.id, 'parent :',i.get_parent(), 'atom full_id :',i.get_full_id())    
        copi.add(i)
        cnt_atm += 1
        
    print('copi :',copi)
    for i in copi:
        print(i, i.serial_number)
    
    
    
    
    my_chain.add(copi)
    
    for model in structure2:
        for chain in model:
            if chain.id == 'A' :
              for residue in chain:
                    for ii in residue:
                        print(ii.serial_number,ii, ii.id, ii.serial_number,chain.id, type(ii), (ii.get_parent()).get_parent().id)
            if chain.id == 'E' :
              for residue in chain:
                    for ii in residue:
                        print(ii.serial_number,ii, ii.id, ii.serial_number,chain.id, type(ii), (ii.get_parent()))
    
    
    del_atom = []
    for model in structure2:
        for chain in model:
            if chain.id == 'A':
                print(chain.id)
                for residue in chain:
                    for atom in residue:
                          if atom in list_atom:
                            del_atom.append(atom)
    
    
    print('del_atom : ',del_atom)
    for i in del_atom:
        print(i, i.serial_number, ((i.get_parent()).get_parent()).id)
    
    
    
    for i in list_resi:
        del_i = i
        print('del_i :',del_i)
        for ii in i.get_list():
            if ii in del_atom:
                i.detach_child(ii.id)
                print(ii.serial_number)
                if ii in del_atom:
                    print('ok')
               
    
                             
    print('____________')
        
    # 
    
    for model in structure2:
        for chain in model:
            if chain.id == 'A' :
              for residue in chain:
                    for ii in residue:
                        print(ii.serial_number,ii, ii.id, ii.serial_number,chain.id, type(ii), (ii.get_parent()).get_parent().id)
            if chain.id == 'E' :
              for residue in chain:
                    for ii in residue:
                        print(ii.serial_number,ii, ii.id, ii.serial_number,chain.id, type(ii), (ii.get_parent()))
    
    
    print(structure2.child_dict)
    
    for model in structure2:
        print(model.child_dict)
    
    
    
               
    io.set_structure(structure2)
    io.save('6gch_re-renamed.pdb')              
    

    下面的输出 pdb 图像显示了一半 1A(红色)一半 1E Cys(蓝色)

    或许看看这里会更好:https://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ

    尤其是“结构对象 Structure 对象的整体布局是什么?”在匆忙输入代码之前的部分。

    【讨论】:

      【解决方案3】:

      找到了一个更短的方法,也添加在前面代码的末尾(How do I change the chain name of a pdb file?)。代码很长,因为所有打印语句都跟踪结构对象中的更改。不确定这是否是最短/最快/更正统的方法。它使用

      ChainResidue (from Bio.PDB.Chain import Chain & from Bio.PDB.Residue import Residue

      我唯一缺少的是如何在我的结构对象中重新编号原子,而不必将其保存到 pdb 文件 ---> 现在编辑它们在保存之前重新编号。看看它,让我知道是否适合您的需求:

      from Bio.PDB import PDBList, PDBIO, PDBParser
      
      from Bio.PDB.Chain import Chain
      
      from Bio.PDB.Residue import Residue
      
      import warnings
      warnings.filterwarnings('ignore')
      
      
      
      pdbl = PDBList()
      
      io = PDBIO()
      parser = PDBParser()
      pdbl.retrieve_pdb_file('6gch', pdir='.', file_format="pdb")
      
      # pdb6gch.ent is the filename when retrieved by PDBList
      structure = parser.get_structure('6gch', 'pdb6gch.ent')
      
      renames = {
          "E": "A",
          "F": "B",
          "G": "C"
      }
      
      for model in structure:
          for chain in model:
              old_name = chain.get_id()
              new_name = renames.get(old_name)
              if new_name:
                  print(f"renaming chain {old_name} to {new_name}")
                  chain.id = new_name
              else:
                  print(f"keeping chain name {old_name}")
      
      io.set_structure(structure)
      io.save('6gch_renamed.pdb')
      
      structure2 = parser.get_structure('6gch_renamed', '6gch_renamed.pdb')
      
      # for model in structure2:
      #     for chain in model:
      #         if chain.id =='A' or chain.id =='E':
      #             for residue in chain:
      #                 print(residue, residue.get_parent())
      #                 for atom in residue:
      #                     print(atom, atom.get_parent())
      
      
      x = Residue((' ',999,' '), 'POP', "") ##see https://biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ#what-is-a-residue-id
      
      print('new residue X :',x)
                          
      str2_atom = structure2.get_atoms()
      
      atoms = []
      for i in str2_atom:
          if i.serial_number in [1,2,3]:
              print('1st loop : ',i.serial_number, i.get_full_id())
              atoms.append(i)
              
      for i in atoms:
          (i.get_parent()).detach_child(i.id)
          print('2nd loop : ',i.serial_number, i.get_full_id())
      
      print('atoms : ', atoms)
      
      
      print('detached : __________________________________________')
      for model in structure2:
          for chain in model:
              if chain.id =='A' or chain.id =='E':
                  for residue in chain:
                      print(residue, residue.get_parent())
                      for atom in residue:
                          print(atom.serial_number, atom, atom.id, atom.get_parent())
      
      
      print('before add to new chain : ___________')
      for i in atoms:
          # i.set_parent(x)
          print(i.serial_number, i.get_full_id())# i.get_parent(), (i.get_parent()).get_parent())
          
          x.add(i) ## adds atom to residue X ; sets X as i parent
      
      for i in atoms:
          # i.set_parent(x)
          print(i.serial_number, i.get_full_id(), i.get_parent(), (i.get_parent()).get_parent())
      my_chain = Chain("E")
      
      print('created new chain : ', my_chain)
      
      my_chain.add(x)
      
      print('after add to new chain : ___________')
      for i in atoms:
          #i.set_parent(x)
          print(i.serial_number, i.get_full_id(), i.get_parent(), (i.get_parent()).get_parent())
      
      
      print('chains of structure model [0] : _______')
      print(structure2.child_dict)
      for model in structure2:
          print(model.child_dict)
      
      print('add chain E to structure model [0] : _______')
      structure2[0].add(my_chain)
      
      print(structure2.child_dict)
      
      for model in structure2:
          print(model.child_dict)
      
      for model in structure2:
          for chain in model:
              if chain.id =='A' or chain.id =='E':
                  for residue in chain:
                      print(residue, residue.get_parent())
                      for atom in residue:
                          print(atom.serial_number, atom, atom.id, atom.get_parent())
      
      # renumber atoms in new structure
      atom_N = 1
      for model in structure2:
          for chain in model:
              # if chain.id =='A' or chain.id =='E':
                  for residue in chain:
                      # print(residue, residue.get_parent())
                      for atom in residue:
                          # print(atom.serial_number, atom, atom.id, atom.get_parent())
                          setattr(atom, 'serial_number', atom_N)
                          #setattr(copi, 'id',(copi.id[0], 1 ,(copi.id[2])))
                          # print(atom.serial_number, atom, atom.id, atom.get_parent())
                          atom_N += 1
                          
      print('\n stucture with renumbered atoms : \n___________________________________')                  
      for model in structure2:
          for chain in model:
              if chain.id =='A' or chain.id =='E':
                  for residue in chain:
                      print(residue, residue.get_parent())
                      for atom in residue:
                          print(atom.serial_number, atom, atom.id, atom.get_parent())
              
      io.set_structure(structure2)
      io.save('6gch_re-renamed.pdb')  
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-06-05
        • 1970-01-01
        • 1970-01-01
        • 2017-04-13
        • 2010-09-09
        • 2010-10-10
        • 2022-01-13
        相关资源
        最近更新 更多