【发布时间】:2019-12-21 06:44:30
【问题描述】:
我有一个目录,其中包含每日文件,其中每行包含 500 多个字段和 600,000 行。
我想查看 1 个文件并在字段 #351 上找到所有包含 B2 的行。
然后在所有文件中搜索与第一个文件输出中字段 282、341、314 和 348 中的值匹配的任何行。
现在我有以下内容,但它会产生空白输出:
ARCHIVEDIR=/appl/dir/archive
file1_tmp=$$.tmp
zcat ${ARCHIVEDIR}/FILE_12162019.gz | awk 'BEGIN{FS=OFS="|"} $351 == "B2"{gsub(/ /,""); print $282,$341,$314,$348}' > "$file1_tmp"
for fname in ${ARCHIVEDIR}/FILE_*; do
zcat "$fname" | awk -v fname="$fname" '
BEGIN { FS=OFS=SUBSEP="|" }
NR==FNR { tgts[$0]; next }
($282,$341,$314,$348) in tgts { print fname, $0 }
' "$file1_tmp" -
done
例如,file1 在字段 351 中有 130,000 条包含 B2 的记录。我想从所有文件(包括 file1 中的原始文件)中查找与字段 282、341、314 和 348 匹配的任何记录。
下面的原始帖子 - 重新发布以尝试消除一些混乱
我放弃了尝试,并在 for 循环中得到了以下结果:
echo -e "$FILENAME|\c"
zcat $FILENAME | grep "$SYSTEM" | grep "$RECORDNUM" | grep "$LOCATION" | grep "$PENGUINS"
输出是:
FILENAME|{每行匹配所有 4 个搜索变量}
我正在寻找一个 awk 命令,它可以有效地清理它的输出。
我试过了:
zcat $FILENAME | awk -v FILENAME=$FILENAME -v SYSTEM=$SYSTEM -v RECORDNUM=$RECORDNUM -v LOCATION=$LOCATION -v PENGUINS=$PENGUINS -v FS="|" -v OFS='|' '/SYSTEM/ && /RECORDNUM/ && /LOCATION/ && /PENGUINS/ {print FILENAME,$0}'`
因为位置值总是相同的,所以我什至尝试了以下方法:
zcat $FILENAME | awk -v FILENAME=$FILENAME -v SYSTEM=$SYSTEM -v RECORDNUM=$RECORDNUM -v LOCATION=$LOCATION -v PENGUINS=$PENGUINS -v FS="|" -v OFS='|' '($282 == SYSTEM) && ($341 == RECORDNUM) && ($314 == LOCATION) && ($348 == PENGUINS) {print FILENAME,$0}'
示例输入文件:(出于测试目的,我创建了以下文件的 4 个副本并压缩了文件) sh-4.2$ zcat 文件1 SYSTEM1|垫片|1435|垫片|垫片|费城|垫片|垫片|垫片|填充 SYSTEM2|垫片|88083|垫片|垫片|佛罗里达|垫片|垫片|垫片|安装 SYSTEM1|垫片|80128312|垫片|垫片|SOCAL|垫片|垫片|垫片|填充 SYSTEM2|垫片|123141|垫片|垫片|NOCAL|垫片|垫片|垫片|安装 SYSTEM1|垫片|12|垫片|垫片|乔治亚|垫片|垫片|垫片|填充 SYSTEM2|垫片|90391|垫片|垫片|德州|垫片|垫片|垫片|已安装 SYSTEM1|垫片|124910|垫片|垫片|弗吉尼亚|垫片|垫片|垫片|填充 SYSTEM2|垫片|354295|垫片|垫片|佛罗里达|垫片|垫片|垫片|已安装
sh-4.2$ ls -ls
total 32
4 -rw-rw-rw- 1 host pdx 170 Dec 20 06:10 file1.gz
4 -rw-rw-rw- 1 host pdx 170 Dec 20 06:10 file2.gz
4 -rw-rw-rw- 1 host pdx 170 Dec 20 06:10 file3.gz
4 -rw-rw-rw- 1 host pdx 170 Dec 20 06:10 file4.gz
4 -rwxrwxrwx 1 host pdx 727 Dec 20 06:15 testawk
4 -rwxrwxrwx 1 host pdx 626 Dec 20 06:16 testgrep
然后创建了 2 个脚本:testawk
for FILENAME in `ls file1.gz`
do
zcat $FILENAME | awk -v FS='|' -v OFS='|' '{if ($10 == "STUFFED") print $1,$3,$6,$10}' | tr -d " " >> $$.tmp
done
for TMPR in `cat $$.tmp`
do
SYSTEM=`echo $TMPR | awk -v FS='|' '{print $1}'`; export SYSTEM
RECORDNUM=`echo $TMPR | awk -v FS='|' '{print $2}'`; export RECORDNUM
LOCATION=`echo $TMPR | awk -v FS='|' '{print $3}'`; export LOCATION
PENGUINS=`echo $TMPR | awk -v FS='|' '{print $4}'`; export PENGUINS
for FILENAME in `ls fil*`
do
export FILENAME
zcat $FILENAME | awk -v FILENAME=$FILENAME -v SYSTEM=$SYSTEM -v RECORDNUM=$RECORDNUM -v LOCATION=$LOCATION -v PENGUINS=$PENGUINS -v FS="|" '/SYSTEM/ && /RECORDNUM/ && /LOCATION/ && /PENGUINS/'
done
done
和
testgrep
for FILENAME in `ls file1.gz`
do
zcat $FILENAME | awk -v FS='|' -v OFS='|' '{if ($10 == "STUFFED") print $1,$3,$6,$10}' | tr -d " " >> $$.tmp
done
for TMPR in `cat $$.tmp`
do
SYSTEM=`echo $TMPR | awk -v FS='|' '{print $1}'`; export SYSTEM
RECORDNUM=`echo $TMPR | awk -v FS='|' '{print $2}'`; export RECORDNUM
LOCATION=`echo $TMPR | awk -v FS='|' '{print $3}'`; export LOCATION
PENGUINS=`echo $TMPR | awk -v FS='|' '{print $4}'`; export PENGUINS
for FILENAME in `ls fil*`
do
echo -e "$FILENAME|\c"; zcat $FILENAME | grep "$SYSTEM" | grep "$RECORDNUM" | grep "$LOCATION" | grep "$PENGUINS"
done
done
当我执行 testawk 时,输出为空白。
当我执行 testgrep 时,输出包含 $PENGUIN=STUFFED 的所有行,文件名位于每行的开头。
sh-4.2$ ./testgrep
file1.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|1435|SPACER|SPACER|PHILLY|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|80128312|SPACER|SPACER|SOCAL|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|12|SPACER|SPACER|GEORGIA|SPACER|SPACER|SPACER|STUFFED
file1.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file2.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file3.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file4.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
file.gz|SYSTEM1|SPACER|124910|SPACER|SPACER|VIRGINIA|SPACER|SPACER|SPACER|STUFFED
我正在做和尝试做的事情的细分: 脚本的以下部分是相同的,它将为 file1.gz 中字段 10 中具有“STUFFED”的任何行创建一个名为 $$.tmp 的文件。该文件将仅包含字段 1、3、6 中的值, 和 10. (这在脚本的下一部分中使用,目前可以使用)
for FILENAME in `ls file1.gz`
do
zcat $FILENAME | awk -v FS='|' -v OFS='|' '{if ($10 == "STUFFED") print $1,$3,$6,$10}' | tr -d " " >> $$.tmp
done
脚本的下一部分为 4 个字段中的每一个分配变量并导出要在 awk 中使用的变量(不确定是否需要导出)。
for TMPR in `cat $$.tmp`
do
SYSTEM=`echo $TMPR | awk -v FS='|' '{print $1}'`; export SYSTEM
RECORDNUM=`echo $TMPR | awk -v FS='|' '{print $2}'`; export RECORDNUM
LOCATION=`echo $TMPR | awk -v FS='|' '{print $3}'`; export LOCATION
PENGUINS=`echo $TMPR | awk -v FS='|' '{print $4}'`; export PENGUINS
这部分脚本将启动我的 for 循环以检查所有以 fil 开头的文件是否匹配:(我已将 awk 和 grep 命令都包含在内,已将它们注释掉)
for FILENAME in `ls fil*`
do
export FILENAME
# zcat $FILENAME | awk -v FILENAME=$FILENAME -v SYSTEM=$SYSTEM -v RECORDNUM=$RECORDNUM -v LOCATION=$LOCATION -v PENGUINS=$PENGUINS -v FS="|" '/SYSTEM/ && /RECORDNUM/ && /LOCATION/ && /PENGUINS/'
# echo -e "$FILENAME|\c"; zcat $FILENAME | grep "$SYSTEM" | grep "$RECORDNUM" | grep "$LOCATION" | grep "$PENGUINS"
done
然后我结束原来的 for 循环: 完成
【问题讨论】:
-
什么不起作用?
-
找不到与所有 4 个匹配的行。4 个 grep 工作。如果需要,我会尝试整理一个示例文件。
-
@Unix_pharmacy,总是建议发布输入和预期输出的样本,所以请在您的问题中添加相同的内容,然后让我们知道。
-
我在帖子中添加了示例脚本/输入的信息。希望这会有所帮助。
-
顺便说一句,您的 shell 脚本有很多问题 - 首先您应该通过shellcheck.net 运行它并更正它警告您的所有内容。另请参阅stackoverflow.com/questions/673055/…、unix.stackexchange.com/questions/169716/…、mywiki.wooledge.org/Quotes、unix.stackexchange.com/questions/321697/… 和 mywiki.wooledge.org/BashFAQ/082。