【问题标题】:Parsing Column in Pandas DataFrame with one column that contains a nested JSON string使用包含嵌套 JSON 字符串的一列解析 Pandas DataFrame 中的列
【发布时间】:2018-09-26 08:52:23
【问题描述】:

我有一个 Python 中的 DataFrame,如下所示。有一列(下面称为“json”),其中包含一个大的嵌套 JSON 字符串。我该如何解析它,以便我可以拥有一个包含许多列的漂亮干净的数据框。只是特别需要单独列中每个 ID 的成本和每月金额。理想情况下,我的表格如下所示:

id、姓名、费用、每月

10001, 弗兰克, 15.85, 15.85

10002, 玛丽, 30.86, 23.03

    d = {'id': ['10001', '10002'], 'json': ['{"costs":[{"cost":15.85}],"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":15.85,"rating":"A++","waiverOfPremium":1.74,"carrier":"companyabc","face":250000,"term":20,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}],"agentSuggestion":{"costs":[{"cost":15.85}],"options":{"product":"XYZt","gender":"male","healthClass":"0","smoker":"false","age":32,"term":"20","faceAmount":250000,"waiverOfPremiumAmount":1.74,"includeWaiverOfPremium":false,"state":"CT"},"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":15.85,"rating":"A++","waiverOfPremium":1.74,"carrier":"companyabc","face":250000,"term":20,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}]}}', '{"costs":[{"cost":30.86}],"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":23.03,"rating":"A++","waiverOfPremium":7.83,"carrier":"companyabc","face":1000000,"term":10,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}],"agentSuggestion":{"costs":[{"cost":30.86}],"options":{"product":"XYZt","gender":"female","healthClass":"0","smoker":"false","age":35,"term":10,"faceAmount":1000000,"waiverOfPremiumAmount":7.83,"includeWaiverOfPremium":true,"state":"GA"},"policies":[{"logo":"HLIF-transparent-inhouse.png","monthly":23.03,"rating":"A++","waiverOfPremium":7.83,"carrier":"companyabc","face":1000000,"term":10,"newFace":null,"newMonthly":null,"isCompanyD":true,"carrierCode":"xyz","product":"XYZt"}]}}'], 'name':['frank','mary']}

   test = pd.DataFrame(data=d)

【问题讨论】:

    标签: python json pandas parsing dataframe


    【解决方案1】:

    你去。您的 JSON 中有 2 种不同的成本(成本和 agentSuggestion 成本),因此在此处添加两者:

    import json
    test = pd.DataFrame(d, columns = ['id', 'json', 'name'])
    test['cost'] = test['json'].transform(lambda x: json.loads(x)['costs'][0]['cost'])
    test['agent_suggestion_cost'] = test['json']\
        .transform(lambda x: json.loads(x)['agentSuggestion']["costs"][0]['cost'])
    print(test)
    

    您可以按照类似的逻辑来解析其他字段,例如每月。如需更多参考,请参阅例如here 寻找 JSON 美化器(例如使用 Notepad++ 的JSTool)来查看 JSON 的结构,这将有助于理解其结构。

    如果觉得有用,欢迎采纳。

    【讨论】:

    • 谢谢!这正是我正在寻找的,它工作得很好。感谢您的帮助!
    【解决方案2】:

    Pandas 提供了一些实用程序来处理 json 文件。 对您的情况有意义的是 pd.read_jsonpd.io.json_normalize。但是,他们确实希望输入的 json 格式与您的不同。

    orient : string,
    
    Indication of expected JSON string format. Compatible JSON strings can be produced by to_json() with a corresponding orient value. The set of possible orients is:
    
    'split' : dict like {index -> [index], columns -> [columns], data -> [values]}
    'records' : list like [{column -> value}, ... , {column -> value}]
    'index' : dict like {index -> {column -> value}}
    'columns' : dict like {column -> {index -> value}}
    'values' : just the values array
    The allowed and default values depend on the value of the typ parameter.
    
    when typ == 'series',
    allowed orients are {'split','records','index'}
    default is 'index'
    The Series index must be unique for orient 'index'.
    when typ == 'frame',
    allowed orients are {'split','records','index', 'columns','values'}
    default is 'columns'
    The DataFrame index must be unique for orients 'index' and 'columns'.
    The DataFrame columns must be unique for orients 'index', 'columns', and 'records'.
    

    【讨论】:

      猜你喜欢
      • 2019-05-05
      • 1970-01-01
      • 2021-06-12
      • 2016-02-21
      • 1970-01-01
      • 1970-01-01
      • 2018-11-01
      • 1970-01-01
      • 2021-08-07
      相关资源
      最近更新 更多