【问题标题】:How can you compare entries between two columns in linux?如何比较linux中两列之间的条目?
【发布时间】:2019-06-13 01:24:52
【问题描述】:

我想弄清楚氨基酸的第一个字母是否与其字母代码相同。

例如,甘氨酸以 G 开头,其字母代码也是 (G) 另一方面,精氨酸以 A 开头,但其字母代码为 (R)

我正在尝试打印出具有相同字母代码和起始字母的氨基酸。

我有一个 CSV 数据文件,其中的列由 ',' 分隔

Name,One letter code,Three letter code,Hydropathy,Charge,Abundance,DNA codon(s)
Arginine,R,Arg,hydrophilic,+,0.0514,CGT-CGC-CGA-CGG-AGA-AGG
Asparagine,N,Asn,hydrophilic,N,0.0447,AAT-AAC
Aspartate,D,Asp,hydrophilic,-,0.0528,GAT-GAC
Glutamate,E,Glu,hydrophilic,-,0.0635,GAA-GAG
Glutamine,Q,Gln,hydrophilic,N,0.0399,CAA-CAG
Lysine,K,Lys,hydrophilic,+,0.0593,AAA-AAG
Serine,S,Ser,hydrophilic,N,0.0715,TCT-TCC-TCA-TCG-AGT-AGC
Threonine,T,Thr,hydrophilic,N,0.0569,ACT-ACC-ACA-ACG

我相信下面的代码是比较列的一种方法,但我想知道如何从第一列中提取第一个字母并将其与第二列中的字母进行比较

awk '{ if ($1 == $2) { print $1; } }' < foo.txt

【问题讨论】:

标签: linux unix awk


【解决方案1】:

请您尝试关注一下。

awk 'BEGIN{FS=","} substr($1,1,1) == $2' Input_file

输出如下。

Serine,S,Ser,hydrophilic,N,0.0715,TCT-TCC-TCA-TCG-AGT-AGC
Threonine,T,Thr,hydrophilic,N,0.0569,ACT-ACC-ACA-ACG

说明:为上述代码添加说明。

awk '                     ##Starting awk program here.
BEGIN{                    ##Starting BEGIN section for awk here.
 FS=","                   ##Setting FS as comma here, field separator.
}                         ##Closing BLOCK for BEGIN here.
substr($1,1,1) == $2      ##Using substr function of awk to get sub string from line, substr(line/variable/field, starting point, ending point) is method for using it. Getting 1st letter of $1 and comparing it with $2 of current line, if TRUE then it will print current line.
' Input_file              ##Mentioning Input_file name here.

【讨论】:

  • 您能解释一下您是如何设法提取第一列的第一个字母的吗? substr($1,1,1) 是指第一列的第一个字母吗?您如何引用第三列的第二个字母?
  • @VAnon,现在肯定添加了代码的完整解释,如果有任何疑问,请在这里告诉我。
【解决方案2】:

使用grep的更简单方法:

$ grep -E '^(.)[^,]*,\1' input.csv 
Serine,S,Ser,hydrophilic,N,0.0715,TCT-TCC-TCA-TCG-AGT-AGC
Threonine,T,Thr,hydrophilic,N,0.0569,ACT-ACC-ACA-ACG

【讨论】:

    【解决方案3】:

    同RavinderSingh的表达式,但字段选择器属性不同。

    awk -F "," 'substr($1,1,1) == $2' InFile
    
    Serine,S,Ser,hydrophilic,N,0.0715,TCT-TCC-TCA-TCG-AGT-AGC
    Threonine,T,Thr,hydrophilic,N,0.0569,ACT-ACC-ACA-ACG
    

    【讨论】:

      猜你喜欢
      • 2020-04-24
      • 2023-03-27
      • 1970-01-01
      • 2012-09-01
      • 1970-01-01
      • 1970-01-01
      • 2016-02-04
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多