【发布时间】:2019-01-17 17:12:52
【问题描述】:
我有一个数组,我们称之为ensembldb,它有以下几行:
rs2799070 ENST00000379389 ENSG00000187608 ISG15 inframe_insertion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NM_005101.3 NP_005092
rs2799070 ENST00000458555 ENSG00000224969 AL645608.2 missense_variant NA NA antisense NA NULL NULL
rs2799070 ENST00000624652 ENSG00000187608 ISG15 inframe_deletion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
rs2799070 ENST00000624697 ENSG00000187608 ISG15 frameshift_variant NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
还有一个ordered array,我们就叫它ordered_array:
frameshift_variant
missense_variant
inframe_insertion
inframe_deletion
我想订购我的数组ensembldb 以匹配数组ordered_array 中的订单。预期的输出如下:
rs2799070 ENST00000624697 ENSG00000187608 ISG15 frameshift_variant NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
rs2799070 ENST00000458555 ENSG00000224969 AL645608.2 missense_variant NA NA antisense NA NULL NULL
rs2799070 ENST00000379389 ENSG00000187608 ISG15 inframe_insertion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NM_005101.3 NP_005092
rs2799070 ENST00000624652 ENSG00000187608 ISG15 inframe_deletion NA NA protein_coding ISG15 ubiquitin-like modifier [Source:HGNC Symbol;Acc:HGNC:4053]NULL NULL
我检查了这个question,但它没有回答我的问题,因为它是一个多维数组。如何根据有序数组ordered_array对数组ensembldb进行排序?
谢谢。
编辑 1: 按照@anubhava 的要求添加代码
declare -A ordered_array
ordered_array[0]="frameshift_variant"
ordered_array[1]="missense_variant"
ordered_array[2]="inframe_insertion"
ordered_array[3]="inframe_deletion"
while read -r LINE; do
chrom=$(echo -e "$LINE" | cut -f1 -d$'\t' | sed 's/^chr//g')
pos=$(echo -e "$LINE" | cut -f2 -d$'\t')
ref=$(echo -e "$LINE" | cut -f3 -d$'\t')
alt=$(echo -e "$LINE" | cut -f4 -d$'\t')
LINE=$(echo -e "$LINE" | sed 's/^chr//g')
ensembldb=$(echo "PREPARE stmt1 FROM 'SELECT Annotated_ID, Transcript, Gene_ID, Gene_name, Consequence, Swissprot_ID, AA_change, Biotype, Gene_description, RefSeq_mRNA, RefSeq_peptide FROM SNP_annot.37_annot_ensembl_89_full_descr where chrom = \"$chrom\" and Start = \"$pos\" and Local_alleles = \"$ref/$alt\"'; execute stmt1;" | mariadb -A -N)
readarray -t array <<< "$ensembldb"
pos19=$(echo "PREPARE stmt2 FROM 'select hg19_pos from SNP_annot.mut_convert_pos where chrom = \"$chrom\" and hg38_pos = \"$pos\"'; execute stmt2;" | mariadb -A -N)
hits=$(echo -e "$ensembldb" | wc -l)
[ ! -z "$pos19" ] && awk -v line="$LINE" -v pos="$pos19" -v ensembl="$ensembldb" -v hit="$hits" 'BEGIN {print line"\t"ensembl"\t"hit"\t"pos}'
done
1.变量LINE有这样的行:
CHROM POS REF ALT QUAL DP Genotype
chr1 16495 G C 1722.77 252 G/C
chr1 16719 T A 145.77 189 T/A
chr1 16841 G T 701.77 521 G/T
chr1 17626 G A 154.77 124 G/A
2.变量ensembldb是一个MySQL查询,返回多行并转换为数组。它包含我要根据ordered_array 排序并选择与ordered_array 匹配的第一行。
【问题讨论】:
-
@anubhava 我添加了一些代码。希望很清楚。
-
@Law 对我的回答提供一些反馈会很好。它不做你想做的事吗? :)
-
@mickp 我正在尝试,我会尽快通知你