【发布时间】:2020-08-01 21:02:42
【问题描述】:
这不是一个问题。这也是我的第一篇文章。我不是新手,但我也只是 awk 的初学者。
最近我需要从最初未存储为 xml 的两组数据中生成一些 .xml 配置文件。
我搜索了很多关于 AWK 的帮助,但我意识到 99% 的提供的脚本都使用了高级 AWK 技术,这让初学者很难理解。我相信这会降低兴趣并提高学习曲线。
EG。 awk '{/ERROR/}' /log/messages
对于一个不怎么做 awk 脚本的人来说不太容易知道那里发生了什么,但它确实可以做很多事情。
所以在这里我将提供一个新手来完成这样的任务。作为回报
我想提出建议
- 更优化的新手版本。
- 经过优化的高级版本,带有适当的解释,有助于过渡。
$./test1.awk Samfig2.cfg user1.tsv $ls 配置文件* cfg2ZR6ZS29XXOF.xml cfg42IXEIGOQ0FG.xml cfg759YUZKTS368.xml cfgNTQALYCPLE06.xml cfgYDMWJVLO6YWS.xml
test1.awk
#!/usr/bin/awk -f
BEGIN {
configfile=ARGV[1]
Userfile=ARGV[2]
if (ARGV[2] == "") {
print "ERROR: Need two files Usage "ENVIRON["_"]" Config.cfg Users.tsv" >"/dev/stderr"
exit }
ARGV[1] = "" # We want to control the manipulation of files
ARGV[2] = ""
FS = "=" ; # this is being done dynamically, no need here (oh yes setting here cause almost 90% execution reduction)
getline Header < Userfile; # advance the Header line and get the headers
gsub("\r","",Header); # My production version doesnt need this but the sample data seem to include \r on the end field
HeaderN=split(Header,Headarray,"\t");
# Expand begin block to include {} below to prevent pause for input }
#{
while ((getline User < Userfile) >0 ) # Read row from field into variable User do all the blocks below based on the number of records in Userfile.
{
gsub("\r","",User); # My production version doesnt need this but the sample data seem to include \r on the end field
n=split(User,Detailsarray,"\t"); # split row stored in User into array called Detailsarray n stores the total number of elements with FS =\t
filetostore=("cfg" Detailsarray[HeaderN] ".xml"); # Were are storing each file based on Last Header value in the user file
Recordtmp="" #To reduce file IO will append to string then output later.
Recordtmp ="<?xml version=\42""1.0\42 encoding=\42utf-8\42?>"; #\42 is the double quote ". Result is <?xml version="1.0" encoding="utf-8"?>
#without the "" set you would get <?xml version=.0" encoding="utf-8"?> as it would interpret as \421
Recordtmp = Recordtmp "\n<users_provision version=\42""1\42>";
Recordtmp = Recordtmp "\n<config version=\42""1\42>";
for(i=1; i<=HeaderN; i++) # We could also use n instead of HeaderN but just incase I'm maintaining base on the initial header
Recordtmp = Recordtmp "\n <" Headarray[i] ">" Detailsarray[i] "</" Headarray[i] ">";
while ((getline < configfile) >0 )
{
Recordtmp = Recordtmp "\n <" $1 ">" $2 "</" $1 ">";
}
Recordtmp = Recordtmp "\n</config>";
Recordtmp = Recordtmp "\n</users_provision>\n";
close(configfile);
print (Recordtmp)> filetostore;
close(filetostore);
}
#}
# END { # Had to expand begin block to avoid pause issue
close(Userfile);
}
Samfig.cfg
URL=msn.com
Dealer=RealtorSales
SQRFT=3600
Taxes=6000
Asking=1,800,000
Built=July/2019
Listed=07/12/2109
MSRP=2,000,000
Kitchen=5
Baths=2.5
floors=3
Rooms=5
user1.tsv
Name StreeNum StreetName City State ZIP IDcard
Ashanti Simmons 138 Jockey Hollow Avenue Phillipsburg NJ 08865 2ZR6ZS29XXOF
Bobby Marshall 7985 E. Beech Road Flemington NJ 08822 YDMWJVLO6YWS
Marianna Quinn 8950 Main St. Moses Lake WA 98837 42IXEIGOQ0FG
Jaslyn Fuentes 9581 Lafayette Dr. Hummelstown PA 17036 NTQALYCPLE06
Cory Jordan 26 Randall Mill Street Bay City MI 48706 759YUZKTS368
cfg2ZR6ZS29XXOF.xml的内容
<?xml version="1.0" encoding="utf-8"?>
<users_provision version="1">
<config version="1">
<Name>Ashanti Simmons</Name>
<StreeNum>138</StreeNum>
<StreetName>Jockey Hollow Avenue</StreetName>
<City>Phillipsburg</City>
<State>NJ</State>
<ZIP>08865</ZIP>
<IDcard>2ZR6ZS29XXOF</IDcard>
<URL>msn.com</URL>
<Dealer>RealtorSales</Dealer>
<SQRFT>3600</SQRFT>
<Taxes>6000</Taxes>
<Asking>1,800,000</Asking>
<Built>July/2019</Built>
<Listed>07/12/2109</Listed>
<MSRP>2,000,000</MSRP>
<Kitchen>5</Kitchen>
<Baths>2.5</Baths>
<floors>3</floors>
<Rooms>5</Rooms>
</config>
</users_provision>
这些可以做的改进。
- 从命令行将 FS/split 值读入变量。
- 如果配置文件中存在的默认值在数据文件中不为空,则仅替换它。
【问题讨论】:
-
(嗯,我无法获取要清理的初始“调用脚本”代码的格式,即
$./test1.awk等)否则您似乎对@987654327 @工作。我不明白你的评论# Expand begin block to include {} below to prevent pause for input。有时最好的解决方案是在BEGIN{}或END{}内进行所有处理。如果你输出的 XML 通过xmllint验证,就宣告胜利!从awk获得更多高级特性 XML 是一个真正的“学习机会”,请参阅xmlawk(将尝试找到链接)。祝你好运。 -
sourceforge.net/projects/gawkextlib 可能会有所帮助(代价是学习曲线更陡峭;-))。祝你好运。
-
因为我取消了 ARGV 值,所以当脚本写入中间块时,awk 暂停了脚本等待用户输入。因此是评论。