按照有序数组模式对 bash 数组进行排序答案

【问题标题】：Sort bash array following an ordered array pattern按照有序数组模式对 bash 数组进行排序
【发布时间】：2019-01-17 17:12:52
【问题描述】：

我有一个数组，我们称之为ensembldb，它有以下几行：

rs2799070   ENST00000379389 ENSG00000187608 ISG15   inframe_insertion   NA  NA  protein_coding  ISG15   ubiquitin-like  modifier    [Source:HGNC    Symbol;Acc:HGNC:4053]NM_005101.3    NP_005092
rs2799070   ENST00000458555 ENSG00000224969 AL645608.2  missense_variant    NA  NA  antisense   NA  NULL    NULL
rs2799070   ENST00000624652 ENSG00000187608 ISG15   inframe_deletion    NA  NA  protein_coding  ISG15   ubiquitin-like  modifier    [Source:HGNC    Symbol;Acc:HGNC:4053]NULL   NULL
rs2799070   ENST00000624697 ENSG00000187608 ISG15   frameshift_variant  NA  NA  protein_coding  ISG15   ubiquitin-like  modifier    [Source:HGNC    Symbol;Acc:HGNC:4053]NULL   NULL

还有一个ordered array，我们就叫它ordered_array：

frameshift_variant
missense_variant
inframe_insertion
inframe_deletion

我想订购我的数组ensembldb 以匹配数组ordered_array 中的订单。预期的输出如下：

rs2799070   ENST00000624697 ENSG00000187608 ISG15   frameshift_variant  NA  NA  protein_coding  ISG15   ubiquitin-like  modifier    [Source:HGNC    Symbol;Acc:HGNC:4053]NULL   NULL
rs2799070   ENST00000458555 ENSG00000224969 AL645608.2  missense_variant    NA  NA  antisense   NA  NULL    NULL
rs2799070   ENST00000379389 ENSG00000187608 ISG15   inframe_insertion   NA  NA  protein_coding  ISG15   ubiquitin-like  modifier    [Source:HGNC    Symbol;Acc:HGNC:4053]NM_005101.3    NP_005092
rs2799070   ENST00000624652 ENSG00000187608 ISG15   inframe_deletion    NA  NA  protein_coding  ISG15   ubiquitin-like  modifier    [Source:HGNC    Symbol;Acc:HGNC:4053]NULL   NULL

我检查了这个question，但它没有回答我的问题，因为它是一个多维数组。如何根据有序数组ordered_array对数组ensembldb进行排序？

谢谢。

编辑 1： 按照@anubhava 的要求添加代码

declare -A ordered_array
ordered_array[0]="frameshift_variant"
ordered_array[1]="missense_variant"
ordered_array[2]="inframe_insertion"
ordered_array[3]="inframe_deletion"

while read -r LINE; do
    chrom=$(echo -e "$LINE" | cut -f1 -d$'\t' | sed 's/^chr//g')
    pos=$(echo -e "$LINE" | cut -f2 -d$'\t')
    ref=$(echo -e "$LINE" | cut -f3 -d$'\t')
    alt=$(echo -e "$LINE" | cut -f4 -d$'\t')
    LINE=$(echo -e "$LINE" | sed 's/^chr//g')
    ensembldb=$(echo "PREPARE stmt1 FROM 'SELECT Annotated_ID, Transcript, Gene_ID, Gene_name, Consequence, Swissprot_ID, AA_change, Biotype, Gene_description, RefSeq_mRNA, RefSeq_peptide FROM SNP_annot.37_annot_ensembl_89_full_descr where chrom = \"$chrom\" and Start = \"$pos\" and Local_alleles = \"$ref/$alt\"'; execute stmt1;" | mariadb -A -N)
    readarray -t array <<< "$ensembldb"
    pos19=$(echo "PREPARE stmt2 FROM 'select hg19_pos from SNP_annot.mut_convert_pos where chrom = \"$chrom\" and hg38_pos = \"$pos\"'; execute stmt2;" | mariadb -A -N)
    hits=$(echo -e "$ensembldb" | wc -l)
    [ ! -z "$pos19" ] && awk -v line="$LINE" -v pos="$pos19" -v ensembl="$ensembldb" -v hit="$hits" 'BEGIN {print line"\t"ensembl"\t"hit"\t"pos}'
done

1.变量LINE有这样的行：

CHROM   POS REF ALT QUAL    DP  Genotype
chr1    16495   G   C   1722.77 252 G/C
chr1    16719   T   A   145.77  189 T/A
chr1    16841   G   T   701.77  521 G/T
chr1    17626   G   A   154.77  124 G/A

2.变量ensembldb是一个MySQL查询，返回多行并转换为数组。它包含我要根据ordered_array 排序并选择与ordered_array 匹配的第一行。

【问题讨论】：

@anubhava 我添加了一些代码。希望很清楚。
@Law 对我的回答提供一些反馈会很好。它不做你想做的事吗？ :)
@mickp 我正在尝试，我会尽快通知你

标签： arrays bash sorting

【解决方案1】：

这个awk 可能对你有用：

awk 'FNR==NR{a[$5]=$0;next}{print a[$1]}' file_a file_b

如果 a 和 b 真的是数组：

readarray -t a < <(awk 'FNR==NR{a[$5]=$0;next}{print a[$1]}' <(printf '%s\n' "${a[@]}") <(printf '%s\n' "${b[@]}"))

【讨论】：

你能解释一下命令吗？提前谢谢你
另外，你不应该将 awk 的数组作为带有 -v 参数的 awk 变量传递吗？
首先，该解决方案是否适合您？ :) 没有必要解释它是否不起作用。
不，该解决方案对我不起作用，抱歉。我尝试做 readarray -t a < <(awk 'FNR==NR{ensembldb[$5]=$0;next}{print ensembldb[$1]}' <(printf '%s\n' "${ensembldb[@]}") <(printf '%s\n' "${ordered_array[@]}")) 和 echo "$a" 没有返回
在您的问题中正确显示输入。例如，您的 a 变量包含什么。顺便说一句，它不是我现在可以看到的数组。