在带有数组的嵌套 Json 上使用 Pandas json_normalize答案

【问题标题】：Using Pandas json_normalize on nested Json with arrays在带有数组的嵌套 Json 上使用 Pandas json_normalize
【发布时间】：2018-01-07 04:05:11
【问题描述】：

问题是用嵌套的 json 对象数组规范化 json。我看过类似的问题，并试图使用他们的解决方案无济于事。

这就是我的 json 对象的样子。

{
  "results": [
    {
      "_id": "25",
      "Product": {
        "Description": "3 YEAR",
        "TypeLevel1": "INTEREST",
        "TypeLevel2": "LONG"
      },
      "Settlement": {},
      "Xref": {
        "SCSP": "96"
      },
      "ProductSMCP": [
        {
          "SMCP": "01"
        }
      ]
    },
    {
      "_id": "26",
      "Product": {
        "Description": "10 YEAR",
        "TypeLevel1": "INTEREST",
        "Currency": "USD",
        "Operational": true,
        "TypeLevel2": "LONG"
      },
      "Settlement": {},
      "Xref": {
        "BBT": "CITITYM9",
        "TCK": "ZN"
      },
      "ProductSMCP": [
        {
          "SMCP": "01"
        },
        {
          "SMCP2": "02"
        }
      ]
    }
  ]
}

这是我规范化 json 对象的代码。

data = json.load(j)
data = data['results']
print pd.io.json.json_normalize(data)

我想要的结果应该是这样的

id   Description    TypeLevel1   TypeLevel2  Currency  \
25   3 YEAR US      INTEREST     LONG        NAN
26   10 YEAR US     INTEREST     NAN         USD

BBT   TCT  SMCP  SMCP2  SCSP   
NAN   NAN  521   NAN    01
M9    ZN   01    02     NAN

但是，我得到的结果是这样的：

  Product.Currency Product.Description Product.Operational Product.TypeLevel1  \
0              NaN              3 YEAR                 NaN           INTEREST
1              USD             10 YEAR                True           INTEREST

  Product.TypeLevel2                        ProductSMCP  Xref.BBT Xref.SCSP  \
0               LONG                   [{'SMCP': '01'}]       NaN        96
1               LONG  [{'SMCP': '01'}, {'SMCP2': '02'}]  CITITYM9       NaN

  Xref.TCK _id
0      NaN  25
1       ZN  26

如您所见，问题出在 ProductSCMP，它并未完全展平数组。

【问题讨论】：

标签： python json pandas normalize

【解决方案1】：

一旦我们通过了第一次标准化，我会申请 lambda 来完成这项工作。

from cytoolz.dicttoolz import merge

pd.io.json.json_normalize(data).pipe(
    lambda x: x.drop('ProductSMCP', 1).join(
        x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
    )
)

  Product.Currency Product.Description Product.Operational Product.TypeLevel1 Product.TypeLevel2  Xref.BBT Xref.SCSP Xref.TCK _id SMCP SMCP2
0              NaN              3 YEAR                 NaN           INTEREST               LONG       NaN        96      NaN  25   01   NaN
1              USD             10 YEAR                True           INTEREST               LONG  CITITYM9       NaN       ZN  26   01    02

修剪列名

pd.io.json.json_normalize(data).pipe(
    lambda x: x.drop('ProductSMCP', 1).join(
        x.ProductSMCP.apply(lambda y: pd.Series(merge(y)))
    )
).rename(columns=lambda x: re.sub('(Product|Xref)\.', '', x))

  Currency Description Operational TypeLevel1 TypeLevel2       BBT SCSP  TCK _id SMCP SMCP2
0      NaN      3 YEAR         NaN   INTEREST       LONG       NaN   96  NaN  25   01   NaN
1      USD     10 YEAR        True   INTEREST       LONG  CITITYM9  NaN   ZN  26   01    02

【讨论】：

有没有办法不使用“Product.Currency”而只使用“Currency”？
添加到末尾.rename(columns=lambda x: x.replace('Product.', ''))
抱歉，剩下的名字呢？喜欢Xref
@ChrisJohnson 我更新了我的帖子以包括列名清理。