如何使用 Grep 命令在文本文件中查找特定值答案

【问题标题】：How to use Grep commands to find specific value in text file如何使用 Grep 命令在文本文件中查找特定值
【发布时间】：2019-11-08 04:43:29
【问题描述】：

我需要 grep 一个名为 daily_fails_count.csv 的文件，但只能找到失败的次数。在那个文件里面是这个，在一个较短的范围内：

January,1,0,0
January,1,1,0
January,1,2,0
January,1,3,0
January,1,4,0
January,1,5,0
January,1,6,0
January,1,7,0
January,1,8,0

它的格式是“月、日、小时、故障”。它贯穿所有月份。最后一个值是当时发现的故障数。我知道这里都说 0 但那是因为那里没有发现失败，其他日期都有失败。

我不太擅长 Linux 脚本中的 grep 命令，所以我的问题是，如何通过 grep 查找文件中的最后一个数字？

我正在将这个脚本写在一个名为 make_accum_fail_counts.sh 的文件中，我会这样运行它：

bash make_accum_fail_counts.sh daily_fail_counts.csv > accum_fail_counts.csv

所以我使用 daily_fail_counts.csv 作为新脚本的输入。到目前为止，这是我的脚本：

#!/bin/bash

if [ $# == 1 ]
then
    logFile=$1
fi

cat $logFile > tmpFile

hour=0
failure=0

while [ $hour -le 23 ]
do
    if [ $hour -le 23 ]
    then
        failure=`grep "*,*,*,^[0-10]" tmpFile | wc -l`
    fi
    echo "$hour,$failure"
    hour=$((hour+1))
    failure=0
done
rm -rf tmpFile

我只需要 grep 命令的帮助：

failure=`grep "*,*,*,^[0-10]" tmpFile | wc -l`

只是为了在所有的日子中发现一个小时又一个小时的失败。所以它的输出是：

0,1000
1,1040
2,2888

0:00-1:00 之间有 1000 次失败，1:00-2:00 之间有 1040 次失败等等。提前致谢。

【问题讨论】：

如果最后一个数字后面没有空格，grep -ohE '[[:digit:]]$' YOURFILE 可以。然而，最后一个数字告诉你什么？如果错误数为10，则最后一位为0，与没有错误相同。如果所有行都具有完全相同的结构，cut -d ',' -f 4 YOURFILE 将只为您提供每行的最后一个数字，这可能会更有用。或者你使用 grep 模式 [[:digit:]]+$，它也返回最后一个数字。

标签： linux bash shell awk grep

【解决方案1】：

cat yourfile.csv | cut -d',' -f 4 | paste -s -d+ - | bc

总结所有的失败。使用cut -d',' -f 4 yourfile.csv 以逗号分隔每一行并获得第四个值，这将为您提供一个数字列表，然后是use a shell command to sum a list of numbers。

您可以使用 grep 将其过滤到小时，例如

cat yourfile.csv | cut -d',' -f 3,4 | grep ^0, | cut -d',' -f 2

获取所有第 0 小时的失败计数。

for hour in {0..23}; do
    cat yourfile.csv | cut -d',' -f 3,4 | grep ^$hour, | cut -d',' -f 2 | paste -s -d+ - | bc
done

获取每个小时的总数。

如果您希望它们按天分组，您可以阅读有关 date 命令的信息，了解如何让它输出像 January,1, 这样的字符串，并在上面的命令中添加一个外部 for 循环以传递每一行通过带有date 命令输出的grep。

就个人而言，此时我会开始编写 Python 而不是 bash。 pandas 库更适合这种情况。

【讨论】：

【解决方案2】：

如果我正确理解了您的问题，请您尝试以下操作。这将提供按小时值计算的失败总数（最后一个字段/第 4 个字段），而与月份无关。

awk '
BEGIN{
  FS=OFS=","
}
!b[$3]++{
  c[++count]=$3
}
{
  a[$3]+=$4
}
END{
  for(i=1;i<=count;i++){
    print c[i],a[c[i]]
  }
}
'  Input_file

还有一件事，这种方法将提供与 $3 在 Input_file 中出现的顺序相同的输出。

说明：在此处添加对上述代码的说明。

awk '                          ##Starting awk program here.
BEGIN{                         ##Starting BEGIN section from here.
  FS=OFS=","                   ##Setting FS and OFS as comma here.
}                              ##Closing BLOCK for BEGIN section here.
!b[$3]++{                      ##Checking condition if $3 is NOT present in array b then do following + it is placing $3 in array b.
  c[++count]=$3                ##Creating an array named c whose index is variable count and value is $3, variable count value is keep increasing with 1.
}                              ##Closing BLOCK for array b condition here.
{
  a[$3]+=$4                    ##Creating an array named a with index $3 and value is $4 and its keep adding its value to its own same index value.
}
END{                           ##Starting END section of this program here.
  for(i=1;i<=count;i++){       ##Starting for loop from i=1 to till value of count variable here.
    print c[i],a[c[i]]         ##Printing array c value index variable i and printing array a value whose index is array c with index variable i.
  }                            ##Closing BLOCK for, for loop here.
}                              ##Closing BLOCK for END section of this program here.
'  Input_file                  ##Mentioning Input_file name here.

【讨论】：

我对 awk 不熟悉，有什么方法可以简单地使用 grep 来统计失败吗？
@Tristan，相信我awk 会容易得多，而且它默认出现在服务器中。对于grep 来说，这将是一项艰巨的任务，您可能还需要使用其他工具，如果有任何疑问，请告诉我？
@Tristan，我现在也为您的理解添加了解释。