【问题标题】:compare multiple columns and only replace if matching比较多列并仅在匹配时替换
【发布时间】:2018-04-18 01:21:24
【问题描述】:
  • 我有两个文件(文件 1 和文件 2)
  • 我正在尝试将 File1 的 Column1 和 2 的字符串与 File2 的 Column4 和 5 进行比较。除此匹配外,File2的column6还需要匹配某个字符串,如SO或CO(因为FILE1的column3和column分别为SO和CO),然后将FILE2的column7替换为FILE1的column3,否则保持其他不变。

  • 我尝试修改并使用论坛提供的解决方案解决类似问题,但没有成功。

    FILE1
    type  code     SO  CO other
    
    7757    1       6941.958        138.922 149.17
    7757    2       8666.123        198.908 225.67
    7757    4       2795.885        334.875 378.68
    7759    GT3     222.104    13.5    734.62
    7768    CT2     0       0       0
    7805    6       3796.677        75.175  79.09 
    
    FILE2
    "US","01073",,"7757","1","SO","10","299"
    "US","01073",,"7758","1","SO","10","299"
    "US","01073",,"7757","1","NO","10","299"
    "US","01073",,"7757","1","CO","10","299"
    "US","01073",,"7757","4","MO","10","299"
    "US","01073",,"7757","1","GO","10","299"
    "US","01073",,"7805","6","CO","10","299"
    
    Required output:
    "US","01073",,"7757","1","SO","6941.958","299"
    "US","01073",,"7758","1","SO","10","299"
    "US","01073",,"7757","1","NO","10","299"
    "US","01073",,"7757","1","CO","138.922","299"
    "US","01073",,"7757","4","MO","10","299"
    "US","01073",,"7757","1","GO","10","299"
    "US","01073",,"7805","6","CO","75.175","299"
    

    我尝试过的解决方案(仅适用于 CO):

    tr -d '"' < FILE2 > temp  # to remove double quote
    awk 'NR==FNR{A[$1,$2]=$3;next} A[$4,$5] && $6=="CO" {$7=A[$1,$2]; print}' FS=" " OFS="," FILE1 temp > out
    

【问题讨论】:

  • 非常感谢您帮助编辑我的代码!随机数。

标签: awk


【解决方案1】:

复杂的awk解决方案:

awk 'function unquote(f){ 
         return substr(f, 2, length(f)-2) 
     }
     NR==FNR{ 
         if (NR==1){ f3=$3; f4=$4 }
         else if (NF){ a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 }
         next; 
     }
     { k=unquote($4) SUBSEP unquote($5) SUBSEP unquote($6) }
     k in a{ $7=a[k] }1' file1 FS=',' OFS=',' file2
  • function unquote(f) { ... } - 取消引用/提取双引号之间的值(实际上 - 在字符串的第一个和最后一个字符之间)

  • a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 - 对关键序列进行分组


输出:

"US","01073",,"7757","1","SO",6941.958,"299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO",138.922,"299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO",75.175,"299"

【讨论】:

  • 您好 RomanPerekhrest,感谢您的帮助。你的剧本对我来说看起来很棒。但是,我一直得到与“file2”相同的输出,这意味着输出中的 column7 中没有任何替换。有什么提示吗?
  • @kelly,提示:确保您已发布实际输入样本,因为它们已被复制和测试。该解决方案适用于当前发布的示例
  • RomanPerekhrest ,这是我的问题,您的代码运行良好。非常感谢您的帮助和时间。
  • @kelly,没关系
  • @RomanPerekhrest 的解决方案与测试数据完美配合。但是 FILE2 中的实际数据存在问题:column2 类似于“abc,45”或“abc23”,这意味着有些在双引号内有逗号,有些则没有。由于我不能使用双引号作为这个问题的分隔符,如何处理呢?谢谢你的帮助。
猜你喜欢
  • 1970-01-01
  • 2021-12-20
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2022-11-04
  • 2016-09-15
相关资源
最近更新 更多