【问题标题】:Reformat non-serializable JSON-ish data into a format suitable for value extraction in Python将不可序列化的 JSON 数据重新格式化为适合 Python 中值提取的格式
【发布时间】:2018-02-24 01:41:25
【问题描述】:

使用以下简单的 Python 脚本:

import json
file = 'toy.json'
data = json.loads(file)
print(data['gas']) # example

我的数据生成错误...is not JSON serializable

有了这个稍微复杂一点的 Python 脚本:

import json
import sys

#load the data into an element
data = open('transactions000000000029.json', 'r')

#dumps the json object into an element
json_str = json.dumps(data)

#load the json to a string
resp = json.loads(json_str)

#extract an element in the response
print(resp['gas'])

同样的。

我想做的是提取特定索引的所有值,所以理想情况下我想像这样呈现输入:

...
"hash": "0xf2b5b8fb173e371cbb427625b0339f6023f8b4ec3701b7a5c691fa9cef9daf63",    
"gasUsed": "21000",
"hash": "0xf8f2a397b0f7bb1ff212b6bcc57e4a56ce3e27eb9f5839fef3e193c0252fab26"
"gasUsed": "21000"
...

数据如下:

{
  "blockNumber": "1941794",
  "blockHash": "0x41ee74e34cbf9ef4116febea958dbc260e2da3a6bf6f601bfaeb2cd9ab944a29",
  "hash": "0xf2b5b8fb173e371cbb427625b0339f6023f8b4ec3701b7a5c691fa9cef9daf63",
  "from": "0x3c0cbb196e3847d40cb4d77d7dd3b386222998d9",
  "to": "0x2ba24c66cbff0bda0e3053ea07325479b3ed1393",
  "gas": "121000",
  "gasUsed": "21000",
  "gasPrice": "20000000000",
  "input": "",
  "logs": [],
  "nonce": "14",
  "value": "0x24406420d09ce7440000",
  "timestamp": "2016-07-24 20:28:11 UTC"
}
{
  "blockNumber": "1941716",
  "blockHash": "0x75e1602cad967a781f4a2ea9e19c97405fe1acaa8b9ad333fb7288d98f7b49e3",
  "hash": "0xf8f2a397b0f7bb1ff212b6bcc57e4a56ce3e27eb9f5839fef3e193c0252fab26",
  "from": "0xa0480c6f402b036e33e46f993d9c7b93913e7461",
  "to": "0xb2ea1f1f997365d1036dd6f00c51b361e9a3f351",
  "gas": "121000",
  "gasUsed": "21000",
  "gasPrice": "20000000000",
  "input": "",
  "logs": [],
  "nonce": "1",
  "value": "0xde0b6b3a7640000",
  "timestamp": "2016-07-24 20:12:17 UTC"
}

实现这一目标的最佳方法是什么?

我一直在想,也许最好的方法是将其重新格式化为有效的 json?

或者也许只是像正则表达式一样对待它?

【问题讨论】:

    标签: python json regex serialization


    【解决方案1】:

    如果每个块都是有效的 JSON 数据,您可以单独解析它们:

    data = []
    with open('transactions000000000029.json') as inpt:
        lines = []
        for line in inpt:
            if line.startswith('{'):  # block starts
                lines = [line]
            else:
                lines.append(line)    
            if line.startswith('}'):  # block ends
                data.append(json.loads(''.join(lines)))
    
    for block in data:
        print("hash: {}".format(block['hash']))
        print("gasUsed: {}".format(block['gasUsed']))
    

    【讨论】:

    • 麻烦的是每个块都是not有效的json
    • 在您的示例数据中它们是。您的实际数据是什么样的?
    • 与示例相同,但更长
    • 当我尝试它时,我得到了这个错误ValueError: Extra data: line 16 column 1
    【解决方案2】:

    您的 json 文件无效。该数据应该是一个字典列表。然后你应该用逗号分隔每个字典,像这样:

    [  
       {  
          "blockNumber":"1941794",
          "blockHash": "0x41ee74bf9ef411d9ab944a29",
          "hash":"0xf2ef9daf63",
          "from":"0x3c0cbb196e3847d40cb4d77d7dd3b386222998d9",
          "to":"0x2ba24c66cbff0bda0e3053ea07325479b3ed1393",
          "gas":"121000",
          "gasUsed":"21000",
          "gasPrice":"20000000000",
          "input":"",
          "logs":[  
    
          ],
          "nonce":"14",
          "value":"0x24406420d09ce7440000",
          "timestamp":"2016-07-24 20:28:11 UTC"
       },
       {  
          "blockNumber":"1941716",
          "blockHash":"0x75e1602ca8d98f7b49e3",
          "hash":"0xf8f2a397b0f7bb1ff212e193c0252fab26",
          "from":"0xa0480c6f402b036e33e46f993d9c7b93913e7461",
          "to":"0xb2ea1f1f997365d1036dd6f00c51b361e9a3f351",
          "gas":"121000",
          "gasUsed":"21000",
          "gasPrice":"20000000000",
          "input":"",
          "logs":[  
    
          ],
          "nonce":"1",
          "value":"0xde0b6b3a7640000",
          "timestamp":"2016-07-24 20:12:17 UTC"
       }
    ]
    

    然后用这个打开文件:

    with open('toy.json') as data_file:
        data = json.load(data_file)
    

    然后您可以呈现所需的输出,例如:

    for item in data:
        print item['hash']
        print item['gasUsed']
    

    【讨论】:

      猜你喜欢
      • 2014-07-24
      • 2018-09-15
      • 2016-07-12
      • 2019-08-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-06-13
      相关资源
      最近更新 更多