将 dict 元组拆分为数据框中的单个记录答案

【问题标题】：split tuple of dict into individual records in dataframe将 dict 元组拆分为数据框中的单个记录
【发布时间】：2020-04-05 00:58:14
【问题描述】：

我有一个数据框 df。它有一个包含数据的列，如下例所示。每条记录在已被识别为“匹配”的字典列中包含一个元组。我想从 df match 列创建一个新的数据框，如下面的输出。我将元组拆分为单个记录并将它们的每个键拆分为列，添加一个值为“a”的“type”字段以指示这两个记录匹配。我还想添加一个 TypeId 字段，以便每个元组都有一个 id 号来标识匹配的值来自相同的原始记录。谁能建议一种方法来做到这一点？

代码：

df['match'][0]

数据：

{'__class__': 'tuple',
 '__value__': [{'': '363336',
   'unitofmeasure': 'each',
   'product_id': '11',
   'classification': 'top',
   'Id': '363336'},
  {'': '368654',
   'unitofmeasure': 'each',
   'product_id': '10',
   'classification': 'bottom',
   'Id': '368654'}]}

输出：

        unitofmeasure  product_id  classification  Id      type  typeId
363336  each           11          top             363336  a     1
368654  each           10          bottom          368654  a     1

【问题讨论】：

标签： json python-3.x pandas numpy dictionary

【解决方案1】：

# read record in from match

emptLst=[]

for i in range(len(df['match'].dropna())):


    df2=pd.DataFrame(df['match'][i]['__value__'])

    # add match column with value 'a'

    df2['label']='a'

    # df2.head()

    # add column id value based on row number from original dataframe

    df2['labeling_set_id']=i

    emptLst.append(df2)


for j in range(len(emptLst)):

    if j==0:
        dfm=emptLst[0]

    else:

        dfm=pd.concat([dfm,emptLst[j]])


# read record in from distinct

emptLst2=[]

for i in range(len(df['distinct'].dropna())):


    df3=pd.DataFrame(df['distinct'][i]['__value__'])

    # add label column with value 'b'

    df3['label']='b'

    # df3.head()

    # add column id value based on row number from original dataframe

    df3['labeling_set_id']=(i+len(df['distinct'].dropna()))

    emptLst2.append(df3)


for j in range(len(emptLst2)):

    if j==0:
        dfd=emptLst2[0]

    else:

        dfd=pd.concat([dfd,emptLst2[j]])


df_label=pd.concat([dfm,dfd])

df_label['labeling_set_id']=df_label['labeling_set_id']+1

df_label.head()

【讨论】：