给定:
$ cat file
apple
banana
apple
cherry
cherry
delta
epsilon
delta
epsilon
apple pie
delta
delta
您可以使用 Ruby 的段落模式命令行开关让空白行成为每个记录的分隔符,并将每个字段的字段分隔符设置为 \n。然后将每个块唯一化:
$ ruby -00 -F'\n' -lane '$><<$F.uniq.join("\n")<<"\n\n"' file
apple
banana
cherry
delta
epsilon
apple pie
delta
解释:
$ ruby -00 -F'\n' -lane '$><<$F.uniq.join("\n")<<"\n\n"'
^ # ruby 1.9+ only I think
^ # split records by \n\n
^ # split fields by \n
^ # options to:
-l loop over input
a auto split
n don't auto print
e compile command line
^ # to STDOUT
^ # append
^ # the split fields
^ # made uniq
^ # join back to a string
^ # add back the record separator
或者,您可以使用 Ruby 哈希来计算字段,然后只打印哈希的键:
$ ruby -00 -F'\n' -lane 'h=Hash.new(0)
$F.each {|f| h[f]+=1 }
p h
puts h.keys.join("\n")<<"\n\n"
' file
{"apple"=>2, "banana"=>1, "cherry"=>2}
apple
banana
cherry
{"delta"=>2, "epsilon"=>2}
delta
epsilon
{"apple pie"=>1, "delta"=>2}
apple pie
delta
(在 ruby 1.9+ 中,哈希保持插入顺序——这将按文件顺序打印单词。)
如果你想在潜在的字段分隔符中添加,,你可以这样做:
$ ruby -00 -F'\n|,' -lane '$><<$F.uniq.join("\n")<<"\n\n"' file