【问题标题】:How to loop values of different dictionaries and append them in a pandas dataframe如何循环不同字典的值并将它们附加到熊猫数据框中
【发布时间】:2022-01-18 17:29:56
【问题描述】:

我需要循环不同字典的值并从中创建一个数据框。

数据来自一个输出json的api,如下所示

源字典

result = {
    "meta": {
        "request": {
            "segment_name": "Searches1",
            "metrics": ["Visits"]
        },
        "status": "Success"
    },
    "segments": [
        {"date": "2021-11-01", "visits": 100, "confidence": "High"},
        {"date": "2021-11-02", "visits": 200, "confidence": "High"},
        {"date": "2021-11-03", "visits": 300, "confidence": "Low"},
        {"date": "2021-11-04", "visits": 400, "confidence": "High"},
        {"date": "2021-11-05", "visits": 500, "confidence": "Low"},
    ]
},
{
    "meta": {
        "request": {
            "segment_name": "Searches2",
            "metrics": ["Visits"]
        },
        "status": "Success"
    },
    "segments": [
        {"date": "2021-11-01", "visits": 110, "confidence": "High"},
        {"date": "2021-11-02", "visits": 220, "confidence": "High"},
        {"date": "2021-11-03", "visits": 330, "confidence": "Low"},
        {"date": "2021-11-04", "visits": 440, "confidence": "High"},
        {"date": "2021-11-05", "visits": 540, "confidence": "Low"},
    ]
}

我尝试了以下方法,我只是循环“segments”-dictionairy,但这显然不起作用。

我的方法

def getSearches():

    Searches = []
    segment_name = result['meta']['request']['segment_name']

    if "segments" in result:
        for fs in result['segments']:
            Searches.append(
                {"date": fs['date'], "segment_name": segment_name, "visits": fs['visits'], "confidence": fs['confidence']})

    fs_df = pd.DataFrame(Searches)
    print(fs_df)


getSearches()

我收到以下错误消息

错误信息

Traceback (most recent call last):
  File "/Users/ismail/Desktop/sw_dict_test", line 51, in <module>
    getFlightSearches()
  File "/Users/ismail/Desktop/sw_dict_test", line 40, in getFlightSearches
    segment_name = result['meta']['request']['segment_name']
TypeError: tuple indices must be integers or slices, not str

确切地说,我需要从“request”字典中访问“segment_name”以及“segments”字典中的所有变量,并将它们附加到 pandas 表中。

期望的输出


         date segment_name  visits confidence
0  2021-11-01    Searches1     100       High
1  2021-11-02    Searches1     200       High
2  2021-11-03    Searches1     300        Low
3  2021-11-04    Searches1     400       High
4  2021-11-05    Searches1     500        Low
5  2021-11-01    Searches2     110       High
6  2021-11-02    Searches2     220       High
7  2021-11-03    Searches2     330        Low
8  2021-11-04    Searches2     440       High
9  2021-11-05    Searches2     550        Low

我怎样才能做到这一点?

【问题讨论】:

    标签: python python-3.x pandas dictionary for-loop


    【解决方案1】:

    您还可以使用json_normalize 来展平 JSON 数据。由于记录列表,即您需要转换为行的字典存储在“段”中,因此设置record_path='segments'。您只使用“segment_name”作为每条记录的元数据,因此您将其路径设置为列表:meta=[['meta', 'request', 'segment_name']]

    然后使用rename 更改列名并使用reindex 以正确顺序获取列。

    df = pd.json_normalize(result, 'segments', [['meta', 'request', 'segment_name']]).rename({'meta.request.segment_name':'segment_name'}, axis=1).reindex(['date', 'segment_name', 'visits', 'confidence'], axis=1)
    

    输出:

             date segment_name  visits confidence
    0  2021-11-01    Searches1     100       High
    1  2021-11-02    Searches1     200       High
    2  2021-11-03    Searches1     300        Low
    3  2021-11-04    Searches1     400       High
    4  2021-11-05    Searches1     500        Low
    5  2021-11-01    Searches2     110       High
    6  2021-11-02    Searches2     220       High
    7  2021-11-03    Searches2     330        Low
    8  2021-11-04    Searches2     440       High
    9  2021-11-05    Searches2     540        Low
    

    【讨论】:

    • 我真的很喜欢这种方法,但为什么这些条目每天都在重复呢?我想这不是预期的结果,对吧?
    • @please_be_nice 我已编辑以解决此问题。对于那个很抱歉。我还添加了更多解释
    • 感谢您的回复和补充信息。这真的很好用。
    【解决方案2】:

    result 是一个元组,因此错误。将其改为列表并遍历每个元素。

    result = [{
        "meta": {
            "request": {
                "segment_name": "Searches1",
                "metrics": ["Visits"]
            },
            "status": "Success"
        },
        "segments": [
            {"date": "2021-11-01", "visits": 100, "confidence": "High"},
            {"date": "2021-11-02", "visits": 200, "confidence": "High"},
            {"date": "2021-11-03", "visits": 300, "confidence": "Low"},
            {"date": "2021-11-04", "visits": 400, "confidence": "High"},
            {"date": "2021-11-05", "visits": 500, "confidence": "Low"},
        ]
    },
    {
        "meta": {
            "request": {
                "segment_name": "Searches2",
                "metrics": ["Visits"]
            },
            "status": "Success"
        },
        "segments": [
            {"date": "2021-11-01", "visits": 110, "confidence": "High"},
            {"date": "2021-11-02", "visits": 220, "confidence": "High"},
            {"date": "2021-11-03", "visits": 330, "confidence": "Low"},
            {"date": "2021-11-04", "visits": 440, "confidence": "High"},
            {"date": "2021-11-05", "visits": 540, "confidence": "Low"},
        ]
    }]
    
    
    def getSearches(result):
    
        Searches = []
        segment_name = result['meta']['request']['segment_name']
    
        if "segments" in result:
            for fs in result['segments']:
                Searches.append(
                    {"date": fs['date'], "segment_name": segment_name, "visits": fs['visits'], "confidence": fs['confidence']})
    
        return Searches
    
    searches = []
    for r in result:
        searches += getSearches(r)
        
    pd.DataFrame(searches)
    

    【讨论】:

    • 谢谢,这很完美。如果结果变量在函数中定义怎么办?在这种情况下,它不起作用。我将如何处理这个问题?
    • 它仍然可以工作,只需将所有内容放入getSearches 并使其无参数即可。您还可以将result 生成代码与所有挂起代码一起包装在一个新函数中。这样解析结果的逻辑就有点分离了。
    猜你喜欢
    • 2017-10-12
    • 2015-10-20
    • 2022-01-06
    • 2020-09-23
    • 2017-02-13
    • 2017-06-13
    • 2018-06-07
    • 2013-11-18
    • 2018-04-23
    相关资源
    最近更新 更多