【发布时间】:2021-10-10 07:46:36
【问题描述】:
我正在处理一个文件 (.gff3),其中出现了这种模式(其中 # 对应于数字):
TRINITY_DN###_c0_g1~~
示例:
BAN_TRINITY_DN0_c0_g1_i1 transdecoder gene 1 580 . + . ID=TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1 transdecoder mRNA 1 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1~~TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1 transdecoder exon 1 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1.exon1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1 transdecoder CDS 1 570 . + 0 ID=cds.TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1 transdecoder three_prime_UTR 571 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1.utr3p1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN101_c0_g1_i1 transdecoder gene 1 230 . - . ID=TRINITY_DN101_c0_g1~~TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1 transdecoder mRNA 1 230 . - . ID=TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1~~TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1 transdecoder exon 1 230 . - . ID=TRINITY_DN101_c0_g1_i1.p1.exon1;Parent=TRINITY_DN101_c0_g1_i1.p1
BAN_TRINITY_DN101_c0_g1_i1 transdecoder CDS 3 230 . - 0 ID=cds.TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1_i1.p1
我想简单地删除模式,所以输出会是这样的:
BAN_TRINITY_DN0_c0_g1_i1 transdecoder gene 1 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1 transdecoder mRNA 1 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1_i1.p1;Name=ORF%20type%3A5prime_partial%20len%3A190%20%28%2B%29%2Cscore%3D182.16
BAN_TRINITY_DN0_c0_g1_i1 transdecoder exon 1 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1.exon1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1 transdecoder CDS 1 570 . + 0 ID=cds.TRINITY_DN0_c0_g1_i1.p1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN0_c0_g1_i1 transdecoder three_prime_UTR 571 580 . + . ID=TRINITY_DN0_c0_g1_i1.p1.utr3p1;Parent=TRINITY_DN0_c0_g1_i1.p1
BAN_TRINITY_DN101_c0_g1_i1 transdecoder gene 1 230 . - . ID=TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1 transdecoder mRNA 1 230 . - . ID=TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1_i1.p1;Name=ORF%20type%3Ainternal%20len%3A77%20%28-%29%2Cscore%3D24.09
BAN_TRINITY_DN101_c0_g1_i1 transdecoder exon 1 230 . - . ID=TRINITY_DN101_c0_g1_i1.p1.exon1;Parent=TRINITY_DN101_c0_g1_i1.p1
BAN_TRINITY_DN101_c0_g1_i1 transdecoder CDS 3 230 . - 0 ID=cds.TRINITY_DN101_c0_g1_i1.p1;Parent=TRINITY_DN101_c0_g1_i1.p1
我尝试使用sed 来执行此操作,但是,由于模式在大小和组成方面发生了变化,并且我不知道如何通过考虑到这一点来执行字符删除(我对使用 bash)。
有人知道怎么做吗???
【问题讨论】:
-
你使用的
sed命令是什么? -
@PierreFrançois 我试过这样的事情
sed 's/TRINITY.*~~//'但它没有用我想可能是因为在线上有很多“TRINITY” -
“它不起作用”是什么意思?你的意思是它删除了太多的行吗?我猜它没有删除任何东西,因为正则表达式的
*运算符是贪婪的,它自己消耗了行的剩余部分,然后~~不能再匹配任何东西。在下面尝试我的解决方案。