【发布时间】:2019-10-14 18:25:45
【问题描述】:
我目前正在使用 grep 尝试从文件的每一行中提取特定文本。它成功地提取了匹配项,但是,我希望它保留任何没有匹配项的行(将它们保留为空行)。
这是我迄今为止尝试过的(获取每行的城市名称):
grep -o -P '(?<="city":").*?(?=")' input.txt
示例输入:
email":"addictedtotlick7@gmail.com","last_name":"THOMPSON","first_name":"ERIN",,"__v":0,,,,"state":"NY","city":"north tonawanda"}
first_name":"chris","last_name":"caul",,"email":"dawgzn@mail.com",,,,"__v":0}
email":"lesliebo993@hotmail.com",,"first_name":"LESLIE","last_name":"RAMBO",,"city":"DOTHAN","state":"AL",,,"__v":0,
email":"malala@yahoo.com",,,"state":"GA","city":"NORCROSS",,"last_name":"KEO","first_name":"CATHY",,"__v":0,
email":"kdela@gmail.com",,"state":"FL","city":"HOLLYWOOD",,"last_name":"DE LA CRUZ","first_name":"KIDA",,"__v":0,
期望的输出:
north tonawanda
DOTHAN
NORCROSS
HOLLYWOOD
很高兴在 SED 中尝试一些东西,如果它更容易的话,但我宁愿避免使用 AWK,因为我必须处理大文件,不确定我是否有足够的 RAM。
【问题讨论】:
-
grep似乎会丢弃空匹配项。 -
你有 GNU awk 吗?试试
gawk '{print index($0, "\"city\":\"") == 0 ? "" : gensub(/.*\"city\":\"([^\"]*).*/, "\\1", $0);}' file > newfile -
@WiktorStribiżew - 当我运行它时,这似乎产生了正确的输出,但我在控制台中为每一行得到了这个:gawk: cmd. line:1: (FILENAME=db1.txt FNR=100000) 警告:gensub:第三个参数`email":"uccelds@cox.net",,"__v":0,,,"state":"CT"," city":"Rocky Hill","last_name":"Uccello","first_name":"Sebastiano"}' 视为 1
-
好的,知道了。发帖。