【发布时间】:2021-07-24 01:26:33
【问题描述】:
我正在尝试构建一个 bash 脚本,该脚本使用 awk 命令逐行遍历已排序的制表符分隔文件并确定是否:
- 该行的字段 1(分子)与下一行相同,
- 行的字段 5(链)是字符串“减号”,并且
- 下一行的字段 5 是字符串“plus”。
如果这是真的,我想将行中字段 1 和 3 的值添加到文件中,然后将下一行中的字段 4 添加到文件中。对于上下文,排序后,输入文件如下所示:
molecule gene start end strand
ERR2661861.3269 JN051170.1 11330 10778 minus
ERR2661861.3269 JN051170.1 11904 11348 minus
ERR2661861.3269 JN051170.1 12418 11916 minus
ERR2661861.3269 JN051170.1 13000 12469 minus
ERR2661861.3269 JN051170.1 13382 13932 plus
ERR2661861.3269 JN051170.1 13977 14480 plus
ERR2661861.3269 JN051170.1 14491 15054 plus
ERR2661861.3269 JN051170.1 15068 15624 plus
ERR2661861.3269 JN051170.1 15635 16181 plus
因此,在本例中,脚本应在比较第 4 行和第 5 行时发现语句为真,并将以下行附加到文件中:
ERR2661861.3269 13000 13382
到目前为止我的脚本是:
# test input file
file=Eg2.1.txt.out
#sort the file by 'molecule' field, then 'start' field
sort -k1,1 -k3n $file > sorted_file
# create output file and add 'molecule' 'start' and 'end' headers
echo molecule$'\t'start$'\t'end >> Test_file.txt
# for each line of the input file, do this
for i in $sorted_file
do
# check to see if field 1 on current line is the same as field 1 on next line AND if field 5 on current line is "minus" AND if field 5 on next line is "plus"
if [awk '{if(NR==i) print $1}' == awk '{if(NR==i+1) print $1}'] && [awk '{if(NR==i) print $5}' == "minus"] && [awk '{if(NR==i+1) print $5}' == "plus"];
# if this is true, then get the 1st and 3rd fields from current line and 4th field from next line and add this to the output file
then
mol=awk '{if(NR==i) print $1}'
start=awk '{if(NR==i) print $3}'
end=awk '{if(NR==i+1) print $4}'
new_line=$mol$'\t'$start$'\t'$end
echo new_line >> Test_file.txt
fi
done
bash 脚本的第一部分按我的意愿工作,但 for 循环似乎在排序的文件中找不到任何匹配项。是否有人对为什么这可能无法按预期工作有任何见解或建议?
非常感谢!
【问题讨论】: