【问题标题】:Manipulating Large JSON with Python使用 Python 操作大型 JSON
【发布时间】:2019-03-14 16:51:33
【问题描述】:

我有一个相当大的 .json 格式的数据文件,我想对其进行操作,格式如下,就像许多 json 对象在一起:

[
{ 
    "_id" : "...", 
    "idSession" : "...", 
    "createdAt" : "1526894989268", 
    "status" : "COMPLETE", 
    "raw" : "Bobsguide,Marketing Assistant,Sales / Marketing79642,Baitshepi,,etc", 
    "updatedAt" : "...", 
    "graphResults" : [

        [
            "lastName", 
            "stock"
        ], 
        [
            "country", 
            "Botswana"
        ], 
        [
            "location", 
            "Botswana  "
        ], 
        [
            "city", 
            "-"
        ], 
        [
            "state", 
            "-"
        ], 
        [
            "school", 
            "Heriot-Watt University"
        ], 
        [
            "skills", 
            "Budgeting,Business Process Improvement,Business Planning"
        ], 

    ], 

    "eid" : {
        "###" : "12020653-1889-35be-8009-b1c9d43768ac"
    }
}

{ 
    "_id" : "...", 
    "idSession" : "...", 
    "createdAt" : "1526894989268", 
    "status" : "COMPLETE", 
    "raw" : "Bobsguide,79619,Steven,example,steven.jones@example.com,Marketing Assistant,Sales,,etc", 
    "updatedAt" : "...", 
    "graphResults" : [
        [
            "country", 
            "United Kingdom"
        ], 
        [
            "location", 
            "United Kingdom London London"
        ], 
        [
            "city", 
            "London"
        ], 
        [
            "state", 
            "London"
        ], 
        [
            "skills", 
            "Solvency II,Liquidity Risk,Screening,etc"
        ]
    ], 

    "eid" : {
        "###" : "..."
    }
}

...



]

我是否有一种直接的方法可以将其读入 python 脚本以进行操作/分析。感兴趣的主要部分是在图形结果和原始标签下。我对这种形式的原始数据缺乏经验,因此非常感谢任何帮助。

【问题讨论】:

  • 您是否尝试过 Google(或您使用的任何搜索引擎)它(例如:“python 如何读取 json 文件”)?也请发布您解决问题的尝试:[SO]: How to create a Minimal, Complete, and Verifiable example (mcve).
  • pandas.DataFrameread_json() (尽管该方法更适合完全使用 JSON ——而不是某些特定的子字段/记录)。也许您可以使用open()pandas.io.json.loads(),操作生成的Python 字典,然后将其直接提供给pandas.DataFrame

标签: python json pandas dictionary


【解决方案1】:

首先,您发布的数据不正确,应该类似于下面的内容,要访问您提到的元素,您可以尝试下面的内容

{
    "test":[
    { 
        "_id" : "...", 
        "idSession" : "...", 
        "createdAt" : "1526894989268", 
        "status" : "COMPLETE", 
        "raw" : "Bobsguide,Marketing Assistant,Sales /             Marketing79642,Baitshepi,,etc", 
        "updatedAt" : "...", 
        "graphResults" : [

            [
                "lastName", 
                "stock"
            ], 
            [
                "country", 
                "Botswana"
            ], 
            [
                "location", 
                "Botswana  "
            ], 
            [
                "city", 
                "-"
            ], 
            [
                "state", 
                "-"
            ], 
            [
                "school", 
                "Heriot-Watt University"
            ], 
            [
                "skills", 
                "Budgeting,Business Process Improvement,Business Planning"
            ]
        ], 
        "eid" : {
            "###" : "12020653-1889-35be-8009-b1c9d43768ac"
        }
        },
        { 
            "_id" : "...", 
            "idSession" : "...", 
            "createdAt" : "1526894989268", 
            "status" : "COMPLETE", 
            "raw" : "Bobsguide,79619,Steven,example,steven.jones@example.com,Marketing     Assistant,Sales,,etc", 
            "updatedAt" : "...", 
            "graphResults" : [
                [
                    "country", 
                    "United Kingdom"
                ], 
                [
                    "location", 
                    "United Kingdom London London"
                ], 
                [
                    "city", 
                    "London"
                ], 
                [
                    "state", 
                    "London"
                ], 
                [
                    "skills", 
                    "Solvency II,Liquidity Risk,Screening,etc"
                ]
            ], 

            "eid" : {
                "###" : "..."
            }
        }
    ]
}

// 回答

import json

data_file = open('data.json', 'r')
information = json.load(data_file) // this will give you a json obj

print(information['test'][1]['raw']) // would pick element 1 from array then 

在原始键中选择并打印值

print(information['test'][1]['graphResults']) // would pick element 1 from array then pick and print value in raw key

【讨论】:

    猜你喜欢
    • 2022-08-06
    • 2017-09-25
    • 1970-01-01
    • 2013-11-11
    • 2012-02-01
    • 1970-01-01
    • 1970-01-01
    • 2020-01-06
    • 1970-01-01
    相关资源
    最近更新 更多