【发布时间】:2021-08-13 23:19:40
【问题描述】:
到目前为止,我已将示例数据集 (df)、预期输出 (df2) 和我的代码放在下面。 我有一个 df,其中 i2 列中的某些行包含一个列表 - 采用 json 格式,需要从提取它们的行中分解并重新插入 df。但需要输入到不同的列(i1)。我需要从字符串中提取一个唯一标识符(“id_2”值)并将其插入到 id_2 列中。
到目前为止,在我的代码中,我正在使用 pd.normalize 解析类似 json 的数据,然后将 i1 列中的原始字符串插入到提取字符串的顶部(如果你看一下应该会更清楚下面),然后根据索引重新插入它们。但是我必须指定索引,这不好。我希望它减少对索引的手动输入的依赖,以防它在未来发生更多这些嵌套单元格或索引以某种方式发生变化时发生变化。
欢迎大家提出建议,非常感谢
示例数据
import pandas as pd
df = pd.DataFrame(data={'id': [1, 2, 3, 4, 5], 'id_2': ['a','b','c','d','e'], 'i1': ['How old are you?','Over the last month have you felt','Do you live alone?','In the last week have you had','When did you last visit a doctor?'], 'i2': [0,0,0,0,0]})
df['i2'] = df['i2'].astype('object')
a = [{'id': 'b1', 'item': 'happy?', 'id_2': 'hj59'}, {'id': 'b2', 'item': 'sad?', 'id_2': 'dgb'}, {'id': 'b3', 'item': 'angry?', 'id_2':'kj9'}, {'id': 'b4', 'item': 'frustrated?','id2':'lp7'}]
b = [{'id': 'c1', 'item': 'trouble sleeping?'}, {'id': 'c2', 'item': 'changes in appetite?'}, {'id': 'c3', 'item': 'mood swings?'}, {'id': 'c4', 'item': 'trouble relaxing?'}]
df.at[1, 'i2'] = a
df.at[3, 'i2'] = b
预期输出
df2 = pd.DataFrame(data={'id': [1,2,2,2,2,3,4,4,4,4,5],
'id_2': ['a','hj59','dgb','kj9','lp7','c','d','d','d','d','e'],
'i1': ['How old are you?',
'Over the last month have you felt happy?',
'Over the last month have you felt sad?',
'Over the last month have you felt angry?',
'Over the last month have you felt frustrated?',
'Do you live alone?',
'In the last week have you had trouble sleeping?',
'In the last week have you had changes in appetite?',
'In the last week have you had mood swings?',
'In the last week have you had trouble relaxing?',
'When did you last visit a doctor?'],
'i2': [0,1,1,1,1,0,1,1,1,1,0]})
到目前为止我的丑陋代码
s={}
s = df[df.i2 != 0]
n={}
for i in range(len(s)):
n[i] = pd.json_normalize(s.loc[s.index[i]]['i2']).reset_index(inplace=False, drop=False)
n[i]['i1'] = s.iloc[i].i1 + ' ' + n[i]['item']
def insert_row(i, d1, d2): return d1.iloc[:i, ].append(d2)
for i in n:
if i == 0:
x = insert_row(s.iloc[i].name, df, n[i])
elif i == 1:
x = insert_row(s.iloc[i].name+1+n[i]['index'].count()+1, x, n[i])
y = x.append(df.iloc[s.iloc[i].name+1:, ])
【问题讨论】:
标签: python json pandas indexing insert