在 perl 中制作一个拆分 Fast As 的 gawk 命令答案

【问题标题】：Making a gawk comand that splits Fastas work in perl在 perl 中制作一个拆分 Fast As 的 gawk 命令
【发布时间】：2014-11-29 11:40:31
【问题描述】：

您好，我正在使用这个 gawk 命令来拆分 Fasta 文件：

gawk '/^>c/ {OUT=substr($0,2) ".fa";print " ">OUT}; OUT{print >OUT}' your_input

它可以在终端上完美运行。我只想在使用system 的 perl 脚本中使用它，并使用字符串作为输入文件，但我不知道该怎么做。

我试过这个：

my $string = "secuence.fa"; #this is the file I wanna split .

my $cmd= (gawk '/^>c/ {OUT=substr($0,2) ".fa";print " ">OUT}; OUT{print >OUT}' $string);
system $command;

当我运行脚本时，它说我在 $cmd 中有一些语法错误，但我找不到它。

谢谢。

【问题讨论】：

你有什么好的理由需要从 perl 脚本运行 awk 脚本吗？我的意思是，直接在 Perl 中执行相同的简单操作不是更好吗？
因为我的老师想要:(
system("gawk", '/^>c/ {OUT=substr($0,2) ".fa";print " ">OUT}; OUT{print >OUT}', $string)
这使脚本运行，但我没有获得任何输出。它应该提供 1 个或多个 fasta 文件。
它应该提供输出。它适用于我在 Ubuntu 14.04、Perl 版本 5.18 上。

标签： regex perl awk system

【解决方案1】：

在 perl 中拆分 FASTA 非常简单。 Perl 支持在读取文件时更改记录分隔符。如果您将其更改为 "\n>"，那么 perl 会为您完成所有工作。

这是一个例子：

use strict;

# Set the input record separator to the FASTA record separator
local $/ = "\n>";

while (<DATA>) {
    print "---- New sequence ---\n";
    # perl will put the separator at the end of the record,
    # so we need to remove the separator from the end,
    # and add it back at the beginning
    s/[\n>]+$//;
    s/^(?!>)/>/;
    print $_, "\n";
}

__DATA__
>seq1
ACGTACCTA
>seq2
TTCACTTAC
>seq3
ACCTTATTA

【讨论】：