根据序列长度在此 perl 脚本中添加二肽频率答案

【问题标题】：Add dipeptide frequency in this perl script based on sequence length根据序列长度在此 perl 脚本中添加二肽频率
【发布时间】：2019-09-18 17:28:31
【问题描述】：

我有一个 perl 脚本来从序列（fasta 格式）中获取二肽计数（有 400 种组合，例如 AA、AC、AD、AE...）。但我想根据序列长度添加频率。我有一个包含多个序列的输入 (myfile.fasta)。

我尝试这样做，但我得到了错误的结果。我对perl不是很熟悉。

我的脚本：

use strict;
use warnings;
use Bio::SeqIO;

my @amino=qw/A C D E F G H I K L M N P Q R S T V W Y/;
my @comb=();

foreach my $a (@amino){
    foreach my $b (@amino){
                push (@comb,$a.$b)
        }
}
my $in  = Bio::SeqIO->new(-file => "myfile.fasta" , '-format' => 'Fasta');
while ( my $seq= $in->next_seq ) {
my @dipeps=($seq->seq()=~/(?=(.{2}))/g);
my %di_count=();
$di_count{$_}++ for @dipeps;
print $seq->id();
map{exists $di_count{$_}?print " ",$di_count{$_}:print " ",0}sort @comb;
print "\n";
}

我试过了：

map{exists $di_count{$_}?print " ",$di_count{$_}:print " ",0}sort @comb/length;

map{exists $di_count{$_}?print " ",$di_count{$_}:print " ",0/length}sort @comb;

我也试过定义长度，比如：

my $seq_len = length($seq);

另外，我不想在脚本中定义输入文件，我想定义像“perl script.pl input.fasta > result.txt”。为此我应该使用：

open (S, "$ARGV[0]") || die "cannot open FASTA file to read: $!";

【问题讨论】：

标签： perl

【解决方案1】：

这是非常难看的代码（应该完全重写），但我认为你想要：

my $length = @dipeps;
map{exists $di_count{$_}?print " ",$di_count{$_}/$length:print " ",0}sort @comb;

【讨论】：