非重复行数 - 唯一计数答案

【问题标题】：Number of non repeating lines - unique count非重复行数 - 唯一计数
【发布时间】：2013-05-01 22:34:37
【问题描述】：

这是我的问题：标准输入给出了任意数量的文本行。输出：非重复行数

输入：

She is wearing black shoes.
My name is Johny.
I hate mondays.
My name is Johny.
I don't understand you.
She is wearing black shoes.

输出：

【问题讨论】：

标签： bash shell line unique

【解决方案1】：

您可以尝试使用 uniq man uniq 并执行以下操作

sort file | uniq -u | wc -l

【讨论】：

我在组合中添加了sort 命令。不错的收获......我把它搞砸了
在手册页中指出：注意：'uniq' 不会检测重复的行，除非它们是相邻的。您可能想先对输入进行排序，或使用sort -u' without uniq'。此外，比较遵循“LC_COLLATE”指定的规则。它也有效....
就我而言，对同一文件执行sort file | uniq -u 与sort -u file 的输出不同。 sort -u file 给出了正确的输出。

【解决方案2】：

这是我解决问题的方法：

... | awk '{n[$0]++} END {for (line in n) if (n[line]==1) num++; print num}'

但这很不透明。这是一种（稍微）更清晰的查看方式（需要 bash 版本 4）

... | {
    declare -A count    # count is an associative array

    # iterate over each line of the input
    # accumulate the number of times we've seen this line
    #
    # the construct "IFS= read -r line" ensures we capture the line exactly

    while IFS= read -r line; do
        (( count["$line"]++ ))
    done

    # now add up the number of lines who's count is only 1        
    num=0
    for c in "${count[@]}"; do
        if (( $c == 1 )); then
            (( num++ ))
        fi
    done

    echo $num
}

【讨论】：

在我的 '99 机器上，awk 解决方案可以无缝运行
@sfiore，什么是“'99 机器”？