egrep 找到至少有两次相同单词的行答案

【问题标题】：egrep find a line that has at least two times the same wordegrep 找到至少有两次相同单词的行
【发布时间】：2016-02-12 22:40:55
【问题描述】：

如何使用正则表达式查找至少有两次相同单词的行？

我试过了：

egrep '\w{2,}\1' file

但是终端给了我错误：

egrep: 无效的反向引用号

【问题讨论】：

检查我的编辑；应该这样做。

标签： regex shell grep

【解决方案1】：

您当前的正则表达式存在几个问题。

使用capturing group 捕获单词，使用backreference 捕获单词。
添加\b word boundaries 将单词限制在左侧和右侧。
添加.* 以匹配any amount 之间的any characters。

echo "ABC foo ABC bar" | egrep '\b(\w{2,})\b.*\b\1\b'

ABC foo ABC 条

echo "ABC foo ABCD bar" | egrep '\b(\w{2,})\b.*\b\1\b'

false

See demo at regex101。如果需要，请使用 egrep -o --only-matching 提取相关部分。
您可以进一步使用 .*? lazy dot 和 grep-P --perl-regexp 尽可能少的次数。

【讨论】：

【解决方案2】：

试试这个：

egrep '(\w{2,}).*\1' file

如果您没有捕获组 ((...))，则无需反向引用。

这是一个例子：

$ cat file
this line has the same word twice word
this line does not
this is this and that is that

$ egrep '(\w{2,}).*\1' file
this line has the same word twice word
this is this and that is that

【讨论】：

谢谢，但我认为上面的答案更好地解决了问题，因为它在两边都添加了 \b 单词边界。
我同意 :) 没问题。