计算字符的出现次数[关闭]答案

【问题标题】：count occurrence of character [closed]计算字符的出现次数[关闭]
【发布时间】：2018-02-14 14:19:17
【问题描述】：

我有一个 .txt 文件

ID Number        Name                         Fed Sex Tit  Wtit
4564             A B M Yusop, Tapan           BAN M
59841212         A Rafiq                      IND F   WFM  WFM
19892            Aadel F , Arvin              IND M 
.
.
.

我必须在 linux 命令行中计算这个文件中有多少女性 F 和男性 M。我是 linux shell 新手，所以我只考虑grep 命令，但“名称”中也可以有“M”和“F”。

有什么建议吗？

【问题讨论】：

Stack Overflow 不是代码编写服务。请出示您的代码。由于 Stack Overflow 向您隐藏了关闭原因：寻求调试帮助的问题（“为什么这段代码不起作用？”）必须包括所需的行为、特定问题或错误以及在问题本身。没有明确问题陈述的问题对其他读者没有用处。请参阅：How to create a Minimal, Complete, and Verifiable example。

标签： linux shell command-line

【解决方案1】：

我会使用 awk 来执行此操作（既查找列，又进行计数）：

$ awk '
# first line
NR == 1 { 
    if (col = index($0, "Sex")) {
        next # skip rest of script for this line
    }

    print "Could not find the required header"
    exit
} 

# all lines
{ 
    # increment counts of each `M` or `F`
    ++count[substr($0, col, 1)]
} 

END { 
    # loop through count array and print
    for (i in count) print i, count[i] 
}' file

【讨论】：

你可能想用index()而不是match()，但是+1
@glennjackman 绝对，这里不需要match，将编辑，谢谢！
@TomFenech 谢谢你的回答。但是一个小问题。这个索引函数中的 $0 是什么意思？我只知道 $something 是位置参数，$0 是命令名。
@Martina 这是在 awk 脚本中，所以 $0 表示整行：gnu.org/software/gawk/manual/html_node/Fields.html

【解决方案2】：

首先使用cut 只获取一列。比如：

cut -c40 < file.txt # gets the 40th character on each line

然后计算不同的值：

cut -c40 < file.txt | sort | uniq -c

【讨论】：

谢谢。以及如何找出“Sex”在哪个位置使用-c40？
@MartinaZapletalová：我会在 Emacs 之类的文本编辑器中打开文件，然后询问编辑器是哪一列。或者只是用你的键盘数数。或者猜测和检查。或者....
所以没有办法在命令行中做到这一点？
我必须在命令行中完成所有操作。但是好吧，我试着用键盘数数——它是第 81 个字符。但是当我尝试你的命令cut -c81 < myfilename.txt | sort | uniq -c 时，它什么也没做。当我只使用 `cut -c81
@MartinaZapletalová：如果cut -c81 < myfilename.txt 给了你一堆像S\nM\nF\nM 这样的字符的行，那么通过管道传送到sort 将对它们进行排序，uniq -c 将打印计数。例如运行(echo S; echo M; echo F; echo M) | sort | uniq -c 并查看它打印每个字符的计数。它不是“什么都不做”。

【解决方案3】：

在带有 GNU grep 的 bash 中，你可以这样写：

IFS= read -r header < file          # read the first line of the file
prefix=${header%%Sex *}             # remove "Sex " and everthing after it
skip_regex=${prefix//?/.}           # replace all chars with "."

# then find the letters and count them
grep -oP "^$skip_regex\\K[MF]" file | sort | uniq -c

输出

  1 F
  2 M

【讨论】：