【问题标题】:Bash script that prints statistics about text files [closed]打印有关文本文件的统​​计信息的 Bash 脚本 [关闭]
【发布时间】:2018-04-26 10:34:37
【问题描述】:

我正在尝试编写一个 Linux bash 脚本,它可以帮助我从文本文件中生成一些我需要的统计信息。在我使用的文本文件中假设以下格式:

"string : pathname1 : pathname2 : pathname3 : … pathnameN"

其中路径名“i”是我在其中找到特定字符串的文件的完整路径。例如,这样的文件可能如下所示:

logile.txt

string : "version" pathname1: /home/Desktop/myfile.txt pathname2 : /usr/lib/tmp/sample.txt 

string : "user" pathname1 : temp1/tmpfiles/user.txt  pathname2 : newfile.txt pathname3 : /Downloads/myfiles/old/credentials.txt

string : "admin" pathname1 : 

string: "build" pathname1 : Documents/projects/myproject/readme.txt pathname2 
 : Desktop/readmetoo.txt

在这个例子中,我希望我的 bash 脚本通知我,我总共搜索了 4 个单词(版本、用户、管理员、构建),并且在大多数文件中找到的单词是“用户”,在 3 个文件中找到。使用“awk”命令是个好方法吗?我不熟悉 bash 脚本,所以任何帮助都会很有用!谢谢!

【问题讨论】:

  • logile.txt 的格式不一致。 logile.txt 中的行格式为string : pathname1 :pathname2 ...string: pathname1: pathname2: string: pathname1: path pathname2: path(请注意双点和单词之间的空格)?
  • @KamilCuk 感谢通知,我编辑了格式!
  • Stack Overflow 不是代码编写服务。请出示您的代码。由于 Stack Overflow 向您隐藏了关闭原因:寻求调试帮助的问题(“为什么这段代码不起作用?”)必须包括所需的行为、特定问题或错误以及在问题本身。没有明确问题陈述的问题对其他读者没有用处。请参阅:How to create a Minimal, Complete, and Verifiable example
  • 虽然说“使用其他东西”有点糟糕,但我会这样做 - 虽然你可以用 bash 做很多事情(如下面的 Kamil 演示),但在 Perl 中做同样的事情,Python 或 Ruby 会容易得多。

标签: linux bash shell unix scripting


【解决方案1】:

这不是一件容易的事,它可能可以单独在 awk 中完成,但是您询问的是 bash 脚本。以下 bash 脚本:

#!/bin/bash
set -euo pipefail

wordcount=0
tmp=""
# for each line in input file read 3 fields
while read string name rest; do
    # in each line the first word should be equal to string
    if [ "$string" != string ]; then 
        # if it isn't continue to the next line
        continue; 
    fi
    # remove '"' from name
    name=${name//\"}
    # the rest has a format of pathname <path> 
    # substitute every space with newline in the rest 
    # and remove lines containing pathname'i' 
    # to get only paths in the rest
    rest=$(echo "$rest" | tr ' ' '\n' | grep -v "pathname" ||:)
    # count the lines in rest, each path should be in different line
    restcnt=$(wc -l <<<"$rest")
    # save restcnt and name in single line in tmp string to parse later
    tmp+="$restcnt $name"$'\n'
    # increment wordcount
    wordcount=$[$wordcount+1]

# feed while read loop with a command substitution
done < <(
    # cat stdin or file given as script argument
    # and substitude all doublepoints ':' with spaces
    cat "$@" | sed 's/:/ /g'
)

# sort the tmp string from the lowest to biggest
# and get last line (ie. biggest number
tmp=$(echo "$tmp" | sort -n | tail -n1)
# split tmp into two variables - the word and the count
read mostcnt mostfile <<<"$tmp"

# and print user information
echo "You searched for a total of $wordcount words."
echo "The word that was found in most files was: $mostfile"
echo " which was found in $mostcnt files."

...使用来自以下logile.txt...的输入运行...

string : "version" pathname1:/home/Desktop/myfile.txt pathname2 : /usr/lib/tmp/sample.txt

string : "user" pathname1: temp1/tmpfiles/user.txt pathname2: newfile.txt pathname3: /Downloads/myfiles/old/credentials.txt

string : "admin" pathname1 :

string: "build" pathname1: Documents/projects/myproject/readme.txt pathname2:Desktop/readmetoo.txt

...产生以下结果:

$ /tmp/1.sh <./logile.txt 
You searched for a total of 4 words.
The word that was found in most files was: user
 which was found in 6 files.

【讨论】:

  • 非常感谢您的时间和回复!不过,一个小的修正。在本例中用户被找到 3 次,如日志文件所示。即:pathname1 pathname2 and pathname3
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 2019-07-26
  • 2012-09-28
  • 2013-04-19
  • 1970-01-01
  • 1970-01-01
  • 2016-12-11
  • 1970-01-01
相关资源
最近更新 更多