【问题标题】:Parsing restraints with bash and awk使用 bash 和 awk 解析约束
【发布时间】:2014-11-02 06:11:00
【问题描述】:

我的约束看起来像

G6N-D5C-?: (116.663, 177.052, 29.149) K87CD/E85CB/E94CB/H32CB/Q21CB
L12N-T11C-?: (128.977, 175.109, 174.412) K158C/H60C/A152C/N127C/Y159C(notH60C)
K14N-E13C-?: (117.377, 176.474, 29.823) E187CB/V78CB
A75N-Q74C-?: (123.129, 177.253, 23.513) V131CG1/V135CG1/V78CG1

我需要用输出转换它们:

assign (resid 5 and name C ) (resid 87 and name CD or resid 85 and name CB or resid 94 and name CB or resid 32 and name CB or resid 21 and name CB ) 3.5 2.5 8.5 ! G6N-D5C-?: (116.663, 177.052, 29.149) K87CD/E85CB/E94CB/H32CB/Q21CB
assign (resid 11 and name C ) (resid 158 and name C or resid 60 and name C or resid 152 and name C or resid 127 and name C or resid 159 and name C ) 3.5 2.5 8.5 ! L12N-T11C-?: (128.977, 175.109, 174.412) K158C/H60C/A152C/N127C/Y159C(notH60C)
assign (resid 13 and name C ) (resid 187 and name CB or resid 78 and name CB ) 3.5 2.5 8.5 ! K14N-E13C-?: (117.377, 176.474, 29.823) E187CB/V78CB
assign (resid 74 and name C ) (resid 131 and name CG1 or resid 135 and name CG2 or resid 78 and name CG1 ) 3.5 2.5 8.5 ! A75N-Q74C-?: (123.129, 177.253, 23.513) V131CG1/V135CG1/V78CG1

我尝试过 awk,但我不知道如何分解数组。请帮我转换一下,手工操作很残酷。

【问题讨论】:

  • 这看起来要归结为一个正则表达式。如果您已经尝试过某些东西,则应该将其包含在您的问题中。无论哪种方式,您都应该明确说明输入如何映射到输出。

标签: regex bash perl awk sed


【解决方案1】:

这是使用perl的一种方式:

#!/usr/bin/perl 

use strict;
use warnings;
use autodie;

open my $fh, '<', 'restraints.file';

while (<$fh>) {
    my @values = map { /.(\d+)(\w+)/; $1, $2 } split '/', (split)[-1];
    my ( $resid, $name ) = /^[^-]+-.(\d+)(\w+)-/;
    print "assign (resid $resid and name $name ) (";
    print join ( " or ", 
        map  { "resid $values[$_] and name $values[$_ + 1]" } 
        grep { not $_ % 2 } 0 .. $#values 
    );
    print " ) 3.5 2.5 8.5 ! $_";
}

输出:

assign (resid 5 and name C ) (resid 87 and name CD or resid 85 and name CB or resid 94 and name CB or resid 32 and name CB or resid 21 and name CB ) 3.5 2.5 8.5 ! G6N-D5C-?: (116.663, 177.052, 29.149) K87CD/E85CB/E94CB/H32CB/Q21CB
assign (resid 11 and name C ) (resid 158 and name C or resid 60 and name C or resid 152 and name C or resid 127 and name C or resid 159 and name C ) 3.5 2.5 8.5 ! L12N-T11C-?: (128.977, 175.109, 174.412) K158C/H60C/A152C/N127C/Y159C(notH60C)
assign (resid 13 and name C ) (resid 187 and name CB or resid 78 and name CB ) 3.5 2.5 8.5 ! K14N-E13C-?: (117.377, 176.474, 29.823) E187CB/V78CB
assign (resid 74 and name C ) (resid 131 and name CG or resid 135 and name CG or resid 78 and name CG ) 3.5 2.5 8.5 ! A75N-Q74C-?: (123.129, 177.253, 23.513) V131CG1/V135CG1/V78CG1

【讨论】:

    【解决方案2】:

    您需要拆分- 上的第一个单词并检查第二个元素。
    然后拆分/ 上的最后一个单词并检查每个元素。

    假设 GNU awk,请仔细阅读来自 http://www.gnu.org/software/gawk/manual/html_node/String-Functions.html#String-Functionssplit()match()


    感觉慷慨:

    gawk '
      function extract(str, fmt,      m) {
        if (match(str, /^.([0-9]+)(.+)/, m)) printf fmt, m[1], m[2]
      }
      {
        split($1, a, /-/)
        extract(a[2], "assign (resid %d and name %s ) (")
        n = split($NF, a, /\//)
        sep = ""
        for (i=1; i<=n; i++) {
          extract(a[i], sep "resid %d and name %s ")
          sep = "or "
        }
        print ") 3.5 2.5 8.5 !", $0
      }
    '
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2018-03-31
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多