【问题标题】:insertion of divider between JSON lines after extraction of data提取数据后在 JSON 行之间插入分隔符
【发布时间】:2018-12-22 06:58:27
【问题描述】:

我有一个包含 811 行 JSON 行的文件,我需要对其进行解析。现在,我正在使用以下命令来解析我感兴趣的数据(awk 是必需的,因为我正在使用的 JSON 没有在正确的数组中提供数据):

sed 's/},/},\n/g' 1st_run.json |awk '/"characater"/ { gsub("\"characater\"", "\"char" ++n "\"", $0) } 1'| jq -r '.frames.frame.lps.lp|.characters[]|[.code_ascii,.confidence]|@tsv'

这项工作很好,但我得到了大量没有以任何方式分隔的数据。我怎样才能至少在 JSON 中的每一行之后插入一个分隔符,我有一个有点可解析的结果?

输入

我的 JSON 输入类似于:

...
{"response":{"container":{"id":"80d996a1-c267-4fa4-b3f8-f61ff9fda198","timestamp":"2018-Jul-10 17:00:50.829709"},"id":"00000002-0000-0000-0000-000000000002"},"frames":{"frame":{"id":"398","timestamp":"2016-Nov-30 12:56:47.900000","lps":{"lp":{"licenseplate":"FRJ724","text":"FRJ724","wtext":"FRJ724","confidence":"67","bkcolor":"16777215","color":"16777215","type":"540122","ntip":"6","cct_country_short":"USA","cct_state_short":"NY","tips":{"tip":{"poly":{"p":{"x":"1553","y":"249"},"p":{"x":"1559","y":"249"},"p":{"x":"1559","y":"267"},"p":{"x":"1553","y":"267"}},"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"88"},"tip":{"poly":{"p":{"x":"1561","y":"248"},"p":{"x":"1568","y":"248"},"p":{"x":"1568","y":"267"},"p":{"x":"1561","y":"267"}},"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"96"},"tip":{"poly":{"p":{"x":"1569","y":"248"},"p":{"x":"1575","y":"248"},"p":{"x":"1576","y":"267"},"p":{"x":"1569","y":"267"}},"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},"tip":{"poly":{"p":{"x":"1585","y":"248"},"p":{"x":"1591","y":"248"},"p":{"x":"1591","y":"267"},"p":{"x":"1585","y":"267"}},"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"94"},"tip":{"poly":{"p":{"x":"1593","y":"248"},"p":{"x":"1600","y":"248"},"p":{"x":"1600","y":"267"},"p":{"x":"1593","y":"267"}},"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"88"},"tip":{"poly":{"p":{"x":"1602","y":"248"},"p":{"x":"1607","y":"248"},"p":{"x":"1607","y":"266"},"p":{"x":"1602","y":"266"}},"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"99"}},"ncharacter":"6","characters":{"characater":{"poly":{"p":{"x":"1553","y":"249"},"p":{"x":"1559","y":"249"},"p":{"x":"1559","y":"267"},"p":{"x":"1553","y":"267"}},"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"88"},"characater":{"poly":{"p":{"x":"1561","y":"248"},"p":{"x":"1568","y":"248"},"p":{"x":"1568","y":"267"},"p":{"x":"1561","y":"267"}},"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"96"},"characater":{"poly":{"p":{"x":"1569","y":"248"},"p":{"x":"1575","y":"248"},"p":{"x":"1576","y":"267"},"p":{"x":"1569","y":"267"}},"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},"characater":{"poly":{"p":{"x":"1585","y":"248"},"p":{"x":"1591","y":"248"},"p":{"x":"1591","y":"267"},"p":{"x":"1585","y":"267"}},"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"94"},"characater":{"poly":{"p":{"x":"1593","y":"248"},"p":{"x":"1600","y":"248"},"p":{"x":"1600","y":"267"},"p":{"x":"1593","y":"267"}},"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"88"},"characater":{"poly":{"p":{"x":"1602","y":"248"},"p":{"x":"1607","y":"248"},"p":{"x":"1607","y":"266"},"p":{"x":"1602","y":"266"}},"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"99"}},"det_time_us":"776874","poly":{"p":{"x":"1543","y":"237"},"p":{"x":"1618","y":"237"},"p":{"x":"1618","y":"274"},"p":{"x":"1543","y":"274"}}}},"det_time_us":"1883017"}}}
{"response":{"container":{"id":"fa75e8f8-1b44-4f2f-a09b-6fe3b801ca1b","timestamp":"2018-Jul-10 17:00:55.863641"},"id":"00000002-0000-0000-0000-000000000002"},"frames":{"frame":{"id":"399","timestamp":"2016-Nov-30 12:56:48","lps":{"lp":{"licenseplate":"FRJ724","text":"FRJ724","wtext":"FRJ724","confidence":"47","bkcolor":"16777215","color":"16777215","type":"540122","ntip":"6","cct_country_short":"USA","cct_state_short":"NY","tips":{"tip":{"poly":{"p":{"x":"1553","y":"248"},"p":{"x":"1560","y":"248"},"p":{"x":"1560","y":"266"},"p":{"x":"1554","y":"266"}},"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},"tip":{"poly":{"p":{"x":"1561","y":"248"},"p":{"x":"1568","y":"248"},"p":{"x":"1568","y":"267"},"p":{"x":"1561","y":"267"}},"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},"tip":{"poly":{"p":{"x":"1569","y":"247"},"p":{"x":"1576","y":"247"},"p":{"x":"1576","y":"267"},"p":{"x":"1569","y":"267"}},"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},"tip":{"poly":{"p":{"x":"1586","y":"248"},"p":{"x":"1592","y":"248"},"p":{"x":"1592","y":"267"},"p":{"x":"1586","y":"267"}},"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},"tip":{"poly":{"p":{"x":"1593","y":"248"},"p":{"x":"1600","y":"248"},"p":{"x":"1600","y":"267"},"p":{"x":"1593","y":"267"}},"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},"tip":{"poly":{"p":{"x":"1601","y":"249"},"p":{"x":"1608","y":"249"},"p":{"x":"1608","y":"265"},"p":{"x":"1601","y":"265"}},"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},"ncharacter":"6","characters":{"characater":{"poly":{"p":{"x":"1553","y":"248"},"p":{"x":"1560","y":"248"},"p":{"x":"1560","y":"266"},"p":{"x":"1554","y":"266"}},"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},"characater":{"poly":{"p":{"x":"1561","y":"248"},"p":{"x":"1568","y":"248"},"p":{"x":"1568","y":"267"},"p":{"x":"1561","y":"267"}},"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},"characater":{"poly":{"p":{"x":"1569","y":"247"},"p":{"x":"1576","y":"247"},"p":{"x":"1576","y":"267"},"p":{"x":"1569","y":"267"}},"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},"characater":{"poly":{"p":{"x":"1586","y":"248"},"p":{"x":"1592","y":"248"},"p":{"x":"1592","y":"267"},"p":{"x":"1586","y":"267"}},"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},"characater":{"poly":{"p":{"x":"1593","y":"248"},"p":{"x":"1600","y":"248"},"p":{"x":"1600","y":"267"},"p":{"x":"1593","y":"267"}},"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},"characater":{"poly":{"p":{"x":"1601","y":"249"},"p":{"x":"1608","y":"249"},"p":{"x":"1608","y":"265"},"p":{"x":"1601","y":"265"}},"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},"det_time_us":"600136","poly":{"p":{"x":"1543","y":"238"},"p":{"x":"1618","y":"239"},"p":{"x":"1619","y":"274"},"p":{"x":"1543","y":"273"}}}},"det_time_us":"1495308"}}}
{"response":{"container":{"id":"5c9c773c-a72a-488f-bc49-148dcd6cfa0a","timestamp":"2018-Jul-10 17:01:01.756522"},"id":"00000002-0000-0000-0000-000000000002"},"frames":{"frame":{"id":"400","timestamp":"2016-Nov-30 12:56:48.100000","lps":{"lp":{"licenseplate":"FRJ724","text":"FRJ724","wtext":"FRJ724","confidence":"47","bkcolor":"16777215","color":"16777215","type":"540122","ntip":"6","cct_country_short":"USA","cct_state_short":"NY","tips":{"tip":{"poly":{"p":{"x":"1553","y":"248"},"p":{"x":"1560","y":"248"},"p":{"x":"1560","y":"266"},"p":{"x":"1554","y":"266"}},"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},"tip":{"poly":{"p":{"x":"1561","y":"248"},"p":{"x":"1568","y":"248"},"p":{"x":"1568","y":"267"},"p":{"x":"1561","y":"267"}},"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},"tip":{"poly":{"p":{"x":"1569","y":"247"},"p":{"x":"1576","y":"247"},"p":{"x":"1576","y":"267"},"p":{"x":"1569","y":"267"}},"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},"tip":{"poly":{"p":{"x":"1586","y":"248"},"p":{"x":"1592","y":"248"},"p":{"x":"1592","y":"267"},"p":{"x":"1586","y":"267"}},"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},"tip":{"poly":{"p":{"x":"1593","y":"248"},"p":{"x":"1600","y":"248"},"p":{"x":"1600","y":"267"},"p":{"x":"1593","y":"267"}},"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},"tip":{"poly":{"p":{"x":"1601","y":"249"},"p":{"x":"1608","y":"249"},"p":{"x":"1608","y":"265"},"p":{"x":"1601","y":"265"}},"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},"ncharacter":"6","characters":{"characater":{"poly":{"p":{"x":"1553","y":"248"},"p":{"x":"1560","y":"248"},"p":{"x":"1560","y":"266"},"p":{"x":"1554","y":"266"}},"bkcolor":"16777215","color":"0","code":"70","code_ascii":"F","confidence":"96"},"characater":{"poly":{"p":{"x":"1561","y":"248"},"p":{"x":"1568","y":"248"},"p":{"x":"1568","y":"267"},"p":{"x":"1561","y":"267"}},"bkcolor":"16777215","color":"0","code":"82","code_ascii":"R","confidence":"98"},"characater":{"poly":{"p":{"x":"1569","y":"247"},"p":{"x":"1576","y":"247"},"p":{"x":"1576","y":"267"},"p":{"x":"1569","y":"267"}},"bkcolor":"16777215","color":"0","code":"74","code_ascii":"J","confidence":"96"},"characater":{"poly":{"p":{"x":"1586","y":"248"},"p":{"x":"1592","y":"248"},"p":{"x":"1592","y":"267"},"p":{"x":"1586","y":"267"}},"bkcolor":"16777215","color":"0","code":"55","code_ascii":"7","confidence":"95"},"characater":{"poly":{"p":{"x":"1593","y":"248"},"p":{"x":"1600","y":"248"},"p":{"x":"1600","y":"267"},"p":{"x":"1593","y":"267"}},"bkcolor":"16777215","color":"0","code":"50","code_ascii":"2","confidence":"86"},"characater":{"poly":{"p":{"x":"1601","y":"249"},"p":{"x":"1608","y":"249"},"p":{"x":"1608","y":"265"},"p":{"x":"1601","y":"265"}},"bkcolor":"16777215","color":"0","code":"52","code_ascii":"4","confidence":"63"}},"det_time_us":"457492","poly":{"p":{"x":"1543","y":"238"},"p":{"x":"1618","y":"239"},"p":{"x":"1619","y":"274"},"p":{"x":"1543","y":"273"}}}},"det_time_us":"1311946"}}}
...

已创建输出

4       99
9       95
2       94
3       94
9       97
B       96
A       92
B       94
L       76
E       88
B       90
R       95
1       85
4       99
9       87
2       98
3       97
9       98
B       98
A       94
4       91
9       97
2       90
3       92
9       96
B       98
A       99

预期输出(例如)

在每个 JSON 行之后插入分隔符,与提取的项目数(每行)无关 - (它等于 JSON 中的 .ncharacter

4       99
9       95
2       94
3       94
9       97
B       96
----------
A       92
B       94
L       76
E       88
B       90
R       95
1       85
4       99
----------
9       87
2       98
3       97
9       98
B       98
A       94
4       91
----------
9       97
2       90
3       92
9       96
B       98
A       99

【问题讨论】:

  • 您能否发布输入示例,尤其是损坏部分的示例?
  • @Tomalak 我更新了问题
  • JSON 无效有几个原因。无论如何,您都想在每一行之后插入一个分隔符,对吗?所以用逗号改变换行符,或者在行尾添加一个逗号就足够了,对吧?类似sed 's/$/,/'
  • 哦!我以为您收到了大量无法解析的输入数据。您能否澄清哪些是有问题的输入、预期结果以及您用来实现该目标的代码?因为你写的代码会生成一个 tsv 输出,对吧?
  • 感谢您的编辑。现在我知道是哪个问题了

标签: json parsing awk sed extract


【解决方案1】:

好的,

我编写了一个可以处理格式错误的 JSON 数据的 Python 脚本来解决这个问题。 这个想法是分别遍历每一行,然后通过子字符串分解内容以提取 ascii_codeconfidence 最终会丢失类似的内容:

#!/usr/bin/python

def mysplit( str ):
    spltstr = str.split("code_ascii")
    itr = iter(spltstr)
    next(itr)
    for k in itr:
        a = k.split("\"")
        print a[2] + "     " +a[6]

filepath = 'test2.json'
with open(filepath) as fp: 
    line = fp.readline()
    cnt = 1 
    while line:
        print "----------"
        mysplit(line)
        line = fp.readline()
        cnt += 1

我认为这对我来说应该差不多...

【讨论】:

    【解决方案2】:

    您可以使用 awk 为每一行打印额外的包装 [ ],

    awk 'BEGIN {print "["} END {print "]"} {gsub(/characater/, "char" ++n); print $0 ","}'
    

    【讨论】:

    • 这会取代当前的 awk 片段吗?如果是这样,我得到:parse error: Expected value before ','
    • 您是否从 sed、awk 或 jq 工具或其他工具中得到该错误?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-09-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-01-03
    相关资源
    最近更新 更多