BEGIN和END关键字是来用来读取数据流之前或之后执行命令的特殊模式
1、内建变量
FIELDWIDTHS:由空格分隔开的的定义每个数据字段确切宽度的一列数字
FS:输入字段分隔符
RS:输入数据行分隔符
OFS:输出字段分隔符
ORS:输出数据行分隔符
|
1
2
3
4
|
[[email protected] conf]# cat data2data1,data2,data3,data4,data5data6,data7,data8,data9,data10data11,data12,data13,data14,data15 |
|
1
2
3
4
|
data1-data2-data3data6-data7-data8data11-data12-data13 |
数据流占多行情况下
把RS变量设置成空白字符串,然后在数据行间留一个空白行。gawk会把每个空白行当作一个数据行分隔符
|
1
2
3
4
5
6
7
8
9
|
[[email protected] conf]# cat data3Tom Mullen123 Main Street
Chicago.IL 6001
(312)555-1234
Frank Williams456 Oak street
Indianpolis.IN 46201
(317)55-9786
|
|
1
2
3
|
Tom Mullen (312)555-1234
Frank Williams (317)55-9786
|
2、数据变量
FNR:当前数据文件中的数据行数
NF:数据文件中的字段总数
NR:已处理的输入数据行数目
|
1
2
3
4
5
6
|
root;/bin/bashbin;/sbin/nologindaemon;/sbin/nologinadm;/sbin/nologinlp;/sbin/nologin |
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
[[email protected] conf]# cat data2data1,data2,data3,data4,data5data6,data7,data8,data9,data10data11,data12,data13,data14,data15[[email protected] conf]# gawk 'BEGIN{FS=","}
{print $1,"FNR="FNR,"NR="NR}
END{print "there were",NR,"records processed"}' data2 data2
data1 FNR=1 NR=1
data6 FNR=2 NR=2
data11 FNR=3 NR=3
data1 FNR=1 NR=4
data6 FNR=2 NR=5
data11 FNR=3 NR=6
there were 6 records processed
|
3、自定义变量
变量名不能以数字开头,
|
1
2
3
4
5
6
|
[[email protected] conf]# gawk '> BEGIN{> testing="this is a test"
> print testing> }'this is a test
|
|
1
2
|
[[email protected] conf]# gawk 'BEGIN{X=4;X=X*2+3;print X}'
11 |
在命令行上给变量赋值
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
[[email protected] tmp]# cat script1#!/bin/bash#BEGIN{FS=","}
{print $n}[[email protected] tmp]# cat data3data1,data2,data3,data4,data5data6,data7,data8,data9,data10data11,data12,data13,data14,data15data2data7data12 |
在设置变量后,这个值在代码的BEGINU部分不可用
|
1
2
3
4
5
|
[[email protected] tmp]# vim script1#!/bin/bash#BEGIN{print "the starting value is ",n;FS=","}
{print $n} |
|
1
2
3
4
5
|
加上-v,允许你指定BEGIN代码部分之前设定的变量
|
1
2
3
4
5
|
4、遍历数组
for (var in array)
{
statements
}
for语句在每次将关联数组array的下一个索引值赋给变量var时,执行一遍statements。
记住这个变量是索引值而不是数组元素的值
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
[[email protected] tmp]# gawk 'BEGIN{var["a"]=1
var["g"]=2
var["m"]=3
var["u"]=4
for (test in var)
{print "Index:",test," - Valuse:",var[test]
}}'Index: u - Valuse: 4
Index: m - Valuse: 3
Index: a - Valuse: 1
Index: g - Valuse: 2
|
删除数组的变量
从关联数组中删除数组索引要用一个特别命令
delete array[index]
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
[[email protected] tmp]# gawk 'BEGIN{var["a"]=1
var["g"]=2
for (test in var)
{ print "Index:",test,"- Value",var[test]
}delete var["g"]
print "------------"
for (test in var)
{ print "Index:",test,"- Value",var[test]
}> }'Index: a - Value 1
Index: g - Value 2
------------Index: a - Value 1
|
5、匹配操作符
匹配操作符(~),允许将正则表达式限定在数据行中的特定数据字段。
$1 ~/expression/
用正则表达式/^data2/来匹配第二个数据段,过滤第二个字段以文本data2为开头的行
|
1
2
3
4
5
6
7
8
9
|
[[email protected] tmp]# cat data3data1,data2,data3,data4,data5data6,data7,data8,data9,data10data11,data12,data13,data14,data15data21,data22,data23,data24,data25data6,data7,data8,data9,data10data11,data12,data13,data14,data15 |
gawk程序脚本常用在数据文件查找特定数据元素的强大工具
|
1
2
|
root /bin/bash |
用!符号来排除正则表达式的匹配
$1 !~/expression/
|
1
|
gawk -F: '$1 !~/root/{print $1,$NF}' /etc/passwd
|
6、数学表达式
x == y
x <= y
x < y
x >= y
x > y
如显示所有属于root用户组的(组ID为0)的系统用户
|
1
2
3
4
5
6
|
对文本数据使用表达式,必须要小心,跟正则表达式不同,表达式必须完全匹配
|
1
2
|
data1 |
6、if语句
if (condition)
statement1
或者 if (condition) statement1
|
1
2
3
4
5
6
7
8
|
|
1
2
3
4
5
6
7
8
|
gawk的if语句也支持else子句,允许在if语句条件不成立的情况下执行一条或多条语句。
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
[[email protected] tmp]# gawk '{
> if ($1>100)
> {> x=$1 * 2
> print x> } else
> {> x=$1 / 2
> print x> }}' data452.51.56.527.5 |
也可以这么写
if (condition) statement1:else statement2
|
1
2
3
4
5
6
|
7、while语句
while (condition)
{
statement1
}
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
[[email protected] tmp]# cat data5120 130 135
160 113 140
145 170 215
[[email protected] tmp]# gawk '{total=0
i=1
while (i<4)
{ total+=$i i++
}avg=total/3
print "Averaget:",avg
}' data5Averaget: 128.333
Averaget: 137.667
Averaget: 176.667
Averaget: 0
|
8、do-while语句
do
{
statement1
} while (condition)
|
1
2
3
4
5
6
7
8
9
10
11
12
|
[[email protected] tmp]# gawk '{total=0
i=1
do{ total+=$i
i++
}while (total<150)
print total}' data5250160315 |
9、for语句
for (variable assignment;condition;interation process)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
[[email protected] tmp]# gawk '{> tatal=0
> for (i=1;i<4;i++)
> {> total+=$i> }> avg=total/3
> print "Average:",avg
> }' data5Average: 128.333
Average: 266
Average: 442.667
Average: 442.667
[[email protected] tmp]# gawk '{for (i=1;i<4;i++) total+=$i;avg=total/3;print "Average:",avg}' data5
Average: 128.333
Average: 266
Average: 442.667
Average: 442.667
[[email protected] tmp]# cat data5120 130 135
160 113 140
145 170 215
|
10、格式化打印
在printf命令末尾手动添加换行符来生成新行
如果需要几个单独的printf命令来在同一行上打印多个输出
|
1
2
3
4
5
6
7
8
|
[[email protected] tmp]# cat data3data1,data2,data3,data4,data5data6,data7,data8,data9,data10data11,data12,data13,data14,data15data21,data22,data23,data24,data25data1 data6 data11 data21 |
|
1
2
|
data1 data6 data11 data21 [[email protected] tmp]# |