【发布时间】:2017-12-10 11:36:20
【问题描述】:
所以我有一个看起来像这样的文件:
/translation="MDGVTQQNAALVQEATTAAASLEEQARNLTAAVAAFDLGDKQTV
LITPRAAVPALKRPALKASLPASSSHGNWETF"
/product="Methyl-accepting chemotaxis protein I (serine
chemoreceptor protein)"
CDS complement(471..590)
/db_xref="SEED:fig|1240086.14.peg.2"
/translation="MHQYQSAILAKICRYGGIEKPEITPASVYKLDSHWRYVI"
/product="hypothetical protein"
CDS 717..2354
/db_xref="SEED:fig|1240086.14.peg.3"
/translation="MGFFVVLWGGASGFSLYSLKQVTTLLHDNSTQGRTYTYLVYGND
QYFRSVTRMARVMDYSQFSDAAIASLEEQAQQLTKAVEVFHLGSEYQTAAS
RTRPAGNMALKRPALSGMAPALPPARTASDEGSWEKF"
/product="Methyl-accepting chemotaxis protein I (serine
chemoreceptor protein)"
/product="macromolecule metabolism; macromolecule
degradation; degradation of proteins, peptides,
glycopeptides"
我需要提取“/product=”后引号之间的文本,所以我需要这个:
Methyl-accepting chemotaxis protein I (serine chemoreceptor protein)
hypothetical protein
Methyl-accepting chemotaxis protein I (serine chemoreceptor protein)
macromolecule metabolism; macromolecule degradation; degradation of proteins, peptides, glycopeptides
我必须使用awk,所以我写了这个:
awk '/\/product/ {split($0, a, "\""); printf a[2] "\n"}'
但这仅将信息与“/product”放在同一行,有时信息在两三行。我不知道如何在引号之间获取整个信息,有人可以帮忙吗?
【问题讨论】: