Python Pandas 如何将 JSON 值分配给 Pandas DF答案

【问题标题】：Python Pandas How to Assign JSON Values to a Pandas DFPython Pandas 如何将 JSON 值分配给 Pandas DF
【发布时间】：2021-12-24 23:55:25
【问题描述】：

我正在从这样的 API 请求数据：

output = requests.get(url=url, auth=oauth, headers=headers, data=payload)
output_data_test = output.json()

output_data_test["customers"][0]["id"]
'1'

当我这样做时，值不会被分配：

combined_output_temp_df = output_data_test["customers"][0]["id"]

combined_output_temp_df
Empty DataFrame

我做错了什么？

这是我创建数据框的方式：

combined_output_temp_df = pd.DataFrame(
    columns = [
        "id",
        "first_name",
        "last_name",
        "middle_initial",
        "email",
        ### - Preferences
        "preference_email_invoices",
        "preference_print_invoices",
        "preference_exclude_from_insurance_auto_enroll_on",
        ###
        "username",
        "created_at",,
        "blocked_payments",
        ### - Phone Numbers
        "phone_number_id",
        "phone_number_primary",
        ### - Mailing Address
        "mailing_address_id",
        "mailing_address_address1",
        "mailing_address_address2",
        "mailing_address_city",
        "mailing_address_state",
        "mailing_address_latitude",
        ### - Addresses
        "address_id",
        "address_address1", 
        "address_address2",
        "address_city",
        "address_state",
        "address_invalid_data",
        "address_label"
        ###
        ]
    )

以下是 JSON 的大致样子：

{
    'customers': [
        {
            'id': '1', 
            'first_name': 'James', 
            'last_name': 'Test', 
            'middle_initial': '', 
            'email': 'jamesemail@test.com', 
            'preferences': {
                'email_invoices': False, 
                'print_invoices': False, 
                'exclude_from_insurance_auto_enroll_on': None
                }, 
                
                'username': jamestesting, 
                'created_at': '2021-03-11T13:00:00.404-05:00', 
                'blocked_payments': False, 
                'phone_numbers': [
                    {
                        'id': '234234asdf', 
                        'primary': True
                    }, 
                    {
                        'id': '8438c19a', 
                        'primary': False
                    }
                ], 'mailing_address': {
                    'id': '431fe0b2', 
                    'address1': '15777 Fake Blvd', 
                    'address2': 'Lot 196', 
                    'city': 'Testing', 
                    'state': 'TX', 
                    'latitude': None
                }, 'addresses': [
                    {
                        'id': '431fe0b2', 
                        'address1': '157 whatever', 
                        'address2': 'Lot 196', 
                        'city': 'Sacramento', 
                        'state': 'NY', 
                        'invalid_data': False, 
                        'label': 'Home'
                    }
                ]
          }
   ]

}

有些客户有多个电话号码，有些客户有 0 个电话号码。与此处未找到的邮寄地址和其他属性相同。当我尝试使用explode时，它给了我一条错误消息

【问题讨论】：

根据您共享的代码 - 您不是在创建数据框。
我在代码的前面创建它。其中一列称为“id”。
您没有将数据传递给pd.DataFrame，因此它正在创建一个空的df。你想用这 5 列将output_data_test 加载到熊猫中吗？如果是这样，请分享一些output_data_test 的样本。
我只想要ID列
df = pd.json_normalize(output_data_test, record_path=['customers'])[['id']] ?

标签： python json python-3.x pandas api

【解决方案1】：

您没有定义要添加/替换该字符串的数据框列。有两种方法可以在创建数据框时将其作为参数传递，或者如果您在创建日期框后获取值，则可以将其分配给相应的列。

见：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

以下是伪代码：

import pandas as pd

test_var = '1' 
print(test_var)

new_df = pd.DataFrame( [test_var], dtype="string" )
print(new_df)

new_df['col'] = '2'
print(new_df)

对于多个列，您可以执行以下操作：


combined_output_temp_df = pd.DataFrame(
    columns = [
        "id",
        "first_name",
        "last_name",
        "middle_initial",
        "email"
        ]
    )

combined_output_temp_df = [ 'test_id', 'test_fn', 'test_ln', 'test_middle', 'email' ]

print(combined_output_temp_df);

【讨论】：

这也不起作用，也许我创建数据框的方式有问题。将编辑我的原始代码以包含此内容。
编辑了我原来的帖子以添加它。
我添加了更多伪代码，尚未测试但应该可以工作。您还应该定义所有列的数据类型，或者如果它们不同，您可以使用 df.astype - pandas.pydata.org/docs/reference/api/…

【解决方案2】：

我没有给你一个完整的答案，但这可以给你一个开始。

首先在顶层使用json_normalize 创建一个DataFrame。

df = pd.json_normalize(output_data_test, record_path=['customers'])

然后，如果您需要每行一个电话号码或地址，

df = df.explode('phone_numbers')
df = df.explode('addresses')

然后您可以展平 phone_numbers/addresses 列。

或者，

df = pd.json_normalize(output_data_test, record_path=['customers'])
df_phone = pd.json_normalize(output_data_test['customers'], record_path=['phone_numbers'], meta=['id'], record_prefix='phone_numbers_')
df = df.merge(df_phone, on='id', how='left')

然后对地址做类似的事情。

【讨论】：