正则表达式，grep, egrep

正则表达式, grep, egrep

1，grep：

　　linux上文本处理的三剑客
　　　　grep: 文本过滤工具（模式：pattern）
　　　　　　grep、egrep、fgrep
　　　　sed: stream editor 行编辑器文本编辑工具
　　　　awk: linux上的实现gawk, 文本报告生成器

　　grep:

　　　　Global search REgular expression and Print out the line.

　　　　作用：文本搜索工具，根据用户指定的"模式"对目标文本逐行进行匹配检查，打印匹配到的行；
　　　　模式：由正则表达式字符及文本字符所编写的过滤条件

　　　　REGEXP: 由一类特殊字符及文本字符所编写的模式，其中有些字符不表示字符字面意义，而表示控制或通配的功能：

　　　　　　分两类:

　　　　　　　　基本正则表达式： BRE

　　　　　　　　扩展正则表达式： ERE

　　　　　　　　　　注意：gerp一般只支持基本正则表达式，grep -E, egerp支持扩展正则表达式, fgrep不支持正则表达式

　　　正则表达式引擎

　　　　　　grep [OPTION] PATTERN [file...]
　　　　　　　　选项：
　　　　　　　　　　--color=auto，对匹配到的文本着色显示

　　　　　　　　　　-v：显示不能够被pattern匹配到的行

　　　　　　　　　　-i：忽略字符大小写

　　　　　　　　　　-o：仅显示匹配到的字符串

　　　　　　　　　　-q：静默模式，不输出任何信息

[root@localhost ~]# grep --color=auto "root" /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

[root@localhost ~]# grep -v "abc" /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m

[root@localhost ~]# grep -i "centos" /etc/issue
CentOS release 6.5 (Final)

[root@localhost ~]# grep -o "release" /etc/issue
release

[root@localhost ~]# grep -q "release" /etc/issue
[root@localhost ~]# echo $?
0

View Code

　　　　　　　　　　-A #：显示匹配到的行，追加显示后面的#行，如果后面没有文本内容，则不予显示，表示after

　　　　　　　　　　-B #：显示匹配到的行，追加显示前面的#行，如果前面没有文本内容，则不予显示，表示before

　　　　　　　　　　-C #：显示匹配到的行，追加显示前后的各#行，如果前后没有文本内容，则不予显示，表示context上下文

[root@localhost ~]# grep -A 2 "root" /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
gopher:x:13:30:gopher:/var/gopher:/sbin/nologin


[root@localhost ~]# grep -B 2 "root" /etc/passwd
root:x:0:0:root:/root:/bin/bash
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin


[root@localhost ~]# grep -C 2 "root" /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucp:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
gopher:x:13:30:gopher:/var/gopher:/sbin/nologin

View Code

　　　　　　　　　　-E：使用ERE扩展正则表达式

　　　基本正则表达式元字符

　　　　字符匹配：

　　　　　　　　.: 匹配任意单个字符

　　　　　　　　[]: 匹配指定范围内的任意单个字符

　　　　　　　 [^]: 匹配指定范围外的任意单个字符

[root@localhost ~]# grep "." /tmp/abc
a
b
c

[root@localhost ~]# grep [abc] /tmp/abc
a
b
c

[root@localhost ~]# grep [^abc] /tmp/abc
d
e

View Code

　　　　　　　　[:digit:]：匹配任意单个数字

　　　　　　　　[:lower:]：匹配任意单个小写字母

　　　　　　　　[:upper:]：匹配任意单个大写字母

　　　　　　　　[:alpha:]：匹配任意单个大小写字母

　　　　　　　　[:alnum:]：匹配任意单个大小写字母或数字

　　　　　　　　[:space:]：匹配空格　　　　　　　　　　

　　　　　　　 [:punct:]：匹配任意单个标点符号

[root@localhost ~]# grep [[:punct:]] /tmp/abc
,
.

[root@localhost ~]# grep \'s..n\' /etc/passwd
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin

这里s和n之间有两个点匹配任意两个字符，比如这里出现的s..n=sbin

View Code

　　　　匹配次数：

　　　　　　用在要指定次数的字符后面，用于指定前面的的字符要出现的次数

　　　　　　*: 匹配前面的字符任意次，尽可能的匹配，贪婪模式，*仅表示次数
　　　　　　　　例如：grep "x*y" 表示匹配y前面的字符x 任意次

　　　　　　　　　　　　abxy 对

　　　　　　　　　　　　xay 对

　　　　　　　　　　　　x 不行

　　　　　　.*：任意长度的任意字符

　　　　　　\?: 匹配其前面的字符0或1次，即前面的字符可有可无

　　　　　　\+: 匹配其前面的字符至少1次；

　　　　　　\{m\}: 匹配前面的字符m次

[root@localhost ~]# grep a.* /tmp/x/a
ab
tuaergea
qwertyuiopasdfghjklzxcvbnm
asdfasdfasgeauhtdht

[root@localhost ~]# grep "a\?b" /tmp/x/a
ab *b前面有一个a*
bc *b前面没有字符*
aaaab *这里b前面匹配前面的一个a*

[root@localhost ~]# grep "a\+b" /tmp/x/a
ab
aaaab *b前面至少一个a*

[root@localhost ~]# grep "a\{2\}b" /tmp/x/a
aaaab
aaaaab
aaab
aab*匹配前面的字符两次*

View Code

　　　　　　\{m,n\}：匹配前面的字符至少m次，至多n次

　　　　　　　　\{0,n\}：匹配前面的字符至多n次

　　　　　　　　\{m,\}：匹配前面的字符至少m次

[root@localhost ~]# grep "a\{1,3\}b" /tmp/abc
ab
aab
aaab

[root@localhost ~]# grep "a\{0,3\}b" /tmp/abc
b
b
ab
aab
aaab

[root@localhost ~]# grep "a\{2,\}b" /tmp/abc
aab
aaab

View Code

　　　　位置锚定：

　　　　　　^：行首锚定，用于模式的最左侧

　　　　　　$：行尾锚定，用于模式的最右侧

　　　　　　^PATTERN$：用户模式匹配整行

　　　　　　　　^$：匹配空行

　　　　　　　　^[[:space:]]*$

　　　　　　\<或\b：词首锚定，用于单词模式的左侧

　　　　　　\>或\b：词尾锚定，用于单词模式的右侧

　　　　　　\<pattern\>：匹配整个单词

[root@localhost ~]# grep "^root" /etc/passwd
root:x:0:0:root:/root:/bin/bash

[root@localhost ~]# grep "bash$" /etc/passwd
root:x:0:0:root:/root:/bin/bash
mary:x:503:503:I am mary.:/home/mary:/bin/bash
centos:x:504:504::/tmp/centos:/bin/bash
test:x:505:505::/tmp/test:/bin/bash
rocket:x:507:507::/home/rocket:/bin/bash

[root@localhost ~]# grep "^ab$" /tmp/abc
ab

[root@localhost ~]# grep "\<root" /etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

[root@localhost ~]# grep "bash\>" /etc/passwd
root:x:0:0:root:/root:/bin/bash
mary:x:503:503:I am mary.:/home/mary:/bin/bash
centos:x:504:504::/tmp/centos:/bin/bash
test:x:505:505::/tmp/test:/bin/bash
rocket:x:507:507::/home/rocket:/bin/bash

[root@localhost ~]# grep "\<aaab\>" /tmp/abc
aaab

View Code

　　　　分组：

　　　　　　: 将一个或多个字符捆绑在一起，当作一个整体进行处理

[root@localhost ~]# grep "(xy)*ab" /tmp/abc
ab
aab
aaab
xyab
xyxyab
xyxyxyab

View Code

　　　　　　注意：分组括号中的模式匹配到的内容会被正则表达式引擎记录于内部的变量中，这些变量的命名方式为：\1，\2，\3...
　　　　　　　　\1：从左侧起，第一个左括号以及匹配右括号之间的模式所匹配到的字符
　　　　　　　　　　$ab\+\(xy$*\)
　　　　　　　　　　　　\1：ab\+$xy$*
　　　　　　　　　　　　\2：xy

[root@localhost ~]# grep \'^\([[:alnum:]]\+\>\).*\1$\' /etc/passwd
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt

View Code

　　　　　　后向引用：引用前面的分组括号中到的模式所匹配的字符，（而非模式本身）

[root@localhost ~]# grep --color=auto "(xy)*ab\1+" /tmp/abc
xyabxy
xyxyabxyxy
xyxyxyabxyxyxy

View Code

练习：
1、显示/proc/meminfo文件中以大小s开头的行（要求：使用两种方式）
grep "^[sS]" /proc/meminfo
grep -i "^s" /proc/meminfo

2、显示/etc/passwd文件中不以/bin/bash结尾的行；
grep -v "/bin/bash$" /etc/passwd

3、显示/etc/passwd文件中ID号最大的用户的用户名；
cat /etc/passwd | sort -t: -k3 -n | tail -1 | cut -d: -f1

sort -t: -k3 -n /etc/passwd | tail -1 | cut -d: -f1

4、如果用户root存在，显示其默认的shell程序；
id root &> /dev/null && grep "^root\>" /etc/passwd | cut -d: -f7
grep "^root\>" /etc/passwd &> /dev/null && grep "^root\>" /etc/passwd | cut -d: -f7

5、找出/etc/passwd中的两位或三位数；
grep \'\<[0-9]\{2,3\}\>\' /etc/passwd

6、显示/etc/grub2.cfg文件中，至少以一个空白字符开头的且后面存在非空白字符的行；
grep "^[[:space:]]\+[^[:space:]]" /etc/grub2.cfg

7、找出"netstat -tan"命令的结果中以"LISTEN"后跟0、1或多个空白字符结尾的行；
netstat -tan | grep "LISTEN[[:space:]]*$"

netstat -tan | grep \'LISTEN\>[[:space:]]\{0,\}\'

netstat -tan | grep \'LISTEN[[:space:]]\{0,\}\'

8、添加用户bash、testbash、basher以及nologin（其shell为/sbin/nologin）;而后找出/etc/passwd文件中同shell名称的行；

useradd bash
useradd testbash
useradd basher
useradd -s /sbin/nologin nologin

[root@localhost ~]# tail -4 /etc/passwd
bash:x:601:601::/home/bash:/bin/bash
testbash:x:602:602::/home/testbash:/bin/bash
basher:x:603:603::/home/basher:/bin/bash
nologin:x:604:604::/home/nologin:/sbin/nologin

grep "\([[:alnum:]]\+\).*\1\?" /etc/passwd 错误做法
grep "^\([[:alnum:]]\{1,\}\>\).*\1$" /etc/passwd
grep "^\([[:alnum:]]\{1,\}\)\>.*\1$" /etc/passwd
grep "\(\<[[:alnum:]]\{1,\}\>\).*\1$" /etc/passwd
grep "\(\<[[:alnum:]]\+\>\).*\1$" /etc/passwd

View Code

练习：
1、写一个脚本，实现如下功能
如果user1用户存在，就显示其存在，否则添加之；
显示添加的用户的id号等信息

[root@localhost test]#nano userinfo.sh
 写入此脚本 #!/bin/bash
                id user1 &> /dev/null && echo "user1 exists." || useradd user1
                id user1
[root@localhost test]#chmod +x userinfo.sh
[root@localhost test]#./userinfo.sh
user1 exists.
uid=1003(user1) gid=1003(user1) groups=1003(user1)

View Code

2、写一个脚本，完成如下功能
如果root用户登录了当前系统，就显示root用户在线，否则说明其未登录

[root@localhost test]# cat /tmp/test/b.sh
#!/bin/bash
who | grep "^root\>" &> /dev/null && echo "user online." || "user no login."
[root@localhost test]# bash /tmp/test/b.sh
user online.

View Code

2，egrep及扩展的正则表达式

　　egrep = grep -E

　　egrep [OPTIONS] PATTERN [FILE..]

　　扩展正则表达式的元字符

　　　字符匹配

　　　　　　.: 匹配任意单个字符

　　　　　　[]: 匹配指定范围内的单个字符

[^]: 匹配指定范围外的单个字符

　　　次数匹配：

　　　　　　*: 匹配前面的字符任意次

　　　　　　?: 匹配前面的字符0次或1次

　　　　　　+: 匹配前面的字符1次或多次

　　　　　　{m}: 匹配前面的字符m次

　　　　　　{m,n}: 匹配前面的字符至少m次，至多n次

　　　锚定:

　　　　　　^: 锚定行首

　　　　　　$: 锚定行尾

　　　　　　\<或\b: 锚定词首

　　　　　　\>或\b: 锚定词尾

　　　分组

　　　　　　(): 括号中的字符串作为分组，并后向引用，\1，\2，...

　　　或者

　　　　　　a|b: 例如C|cat 表示C或cat

练习：
1、显示当前系统root、centos、user1用户的默认shell和UID
[root@localhost ~]# grep -E "^(root|centos|user1)\>" /etc/passwd | cut -d: -f3,7
0:/bin/bash
707:/bin/bash
708:/bin/bash

2、找出/etc/rc.d/init.d/functions文件（centos）中某单词后面跟一个小括号的行
[root@localhost ~]# grep -E -o "[_[:alpha:]]+\(\)" /etc/rc.d/init.d/functions

3、使用echo输出一绝对路径，使用egrep取出器路径基名；并且使用egrep取出路径的目录名，类似于dirname命令的结果
[root@localhost ~]# echo "/etc/passwd" | grep -E -o "[^/]+/?$" | cut -d"/" -f1
[root@localhost ~]# echo "/etc/passwd/" | grep -E -o "[^/]+/?$" | cut -d"/" -f1

4、找出ifconfig命令结果中1-255之间的数值
ifconfig | grep -E --color=auto "\<([1-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\>"

5、找出ifconfig命令结果中的IP地址

View Code

　　　fgrep ""

　　　　　　不支持正则表达式搜索，是什么就找什么，快速搜索机制
　　　　　　[root@localhost ~]# fgrep --color=auto "root" /etc/passwd
　　　　　　root:x:0:0:root:/root:/bin/bash
　　　　　　operator:x:11:0:operator:/root:/sbin/nologin