【问题标题】:extract strings and insert as multiple rows based on original index根据原始索引提取字符串并作为多行插入
【发布时间】:2021-08-13 23:19:40
【问题描述】:

到目前为止,我已将示例数据集 (df)、预期输出 (df2) 和我的代码放在下面。 我有一个 df,其中 i2 列中的某些行包含一个列表 - 采用 json 格式,需要从提取它们的行中分解并重新插入 df。但需要输入到不同的列(i1)。我需要从字符串中提取一个唯一标识符(“id_2”值)并将其插入到 id_2 列中。

到目前为止,在我的代码中,我正在使用 pd.normalize 解析类似 json 的数据,然后将 i1 列中的原始字符串插入到提取字符串的顶部(如果你看一下应该会更清楚下面),然后根据索引重新插入它们。但是我必须指定索引,这不好。我希望它减少对索引的手动输入的依赖,以防它在未来发生更多这些嵌套单元格或索引以某种方式发生变化时发生变化。

欢迎大家提出建议,非常感谢

示例数据

import pandas as pd

df = pd.DataFrame(data={'id': [1, 2, 3, 4, 5], 'id_2': ['a','b','c','d','e'], 'i1': ['How old are you?','Over the last month have you felt','Do you live alone?','In the last week have you had','When did you last visit a doctor?'], 'i2': [0,0,0,0,0]})
df['i2'] = df['i2'].astype('object')

a = [{'id': 'b1', 'item': 'happy?', 'id_2': 'hj59'}, {'id': 'b2', 'item': 'sad?', 'id_2': 'dgb'}, {'id': 'b3', 'item': 'angry?', 'id_2':'kj9'}, {'id': 'b4', 'item': 'frustrated?','id2':'lp7'}]
b = [{'id': 'c1', 'item': 'trouble sleeping?'}, {'id': 'c2', 'item': 'changes in appetite?'}, {'id': 'c3', 'item': 'mood swings?'}, {'id': 'c4', 'item': 'trouble relaxing?'}]

df.at[1, 'i2'] = a 
df.at[3, 'i2'] = b 

预期输出

df2 = pd.DataFrame(data={'id': [1,2,2,2,2,3,4,4,4,4,5], 
                         'id_2': ['a','hj59','dgb','kj9','lp7','c','d','d','d','d','e'],
                         'i1': ['How old are you?',
                                'Over the last month have you felt happy?',
                                'Over the last month have you felt sad?',
                                'Over the last month have you felt angry?',
                                'Over the last month have you felt frustrated?',
                                'Do you live alone?',
                                'In the last week have you had trouble sleeping?',
                                'In the last week have you had changes in appetite?',
                                'In the last week have you had mood swings?',
                                'In the last week have you had trouble relaxing?',
                                'When did you last visit a doctor?'], 
                         'i2': [0,1,1,1,1,0,1,1,1,1,0]})

到目前为止我的丑陋代码

s={}
s = df[df.i2 != 0]

n={}

for i in range(len(s)):
    n[i] = pd.json_normalize(s.loc[s.index[i]]['i2']).reset_index(inplace=False, drop=False)  
    n[i]['i1'] = s.iloc[i].i1 + ' ' + n[i]['item']
    def insert_row(i, d1, d2): return d1.iloc[:i, ].append(d2)
    for i in n:
        if i == 0:
            x = insert_row(s.iloc[i].name, df, n[i])
        elif i == 1:
            x = insert_row(s.iloc[i].name+1+n[i]['index'].count()+1, x, n[i]) 
            y = x.append(df.iloc[s.iloc[i].name+1:, ])

【问题讨论】:

    标签: python json pandas indexing insert


    【解决方案1】:

    Explodei2 上的数据框,然后使用i2 访问器从列i2 中检索与键item 关联的值,然后使用loc 进行索引更新@ 列中的值987654327@ 到 1 并将 i1 中的字符串与检索到的项目值连接起来

    df2 = df.explode('i2', ignore_index=True)
    s = df2['i2'].str['item']
    df2.loc[s.notna(), 'i2'] =  1
    df2.loc[s.notna(), 'i1'] += ' ' + s
    

        id                                                  i1 i2
    0    1                                    How old are you?  0
    1    2            Over the last month have you felt happy?  1
    2    2              Over the last month have you felt sad?  1
    3    2            Over the last month have you felt angry?  1
    4    2       Over the last month have you felt frustrated?  1
    5    3                                  Do you live alone?  0
    6    4     In the last week have you had trouble sleeping?  1
    7    4  In the last week have you had changes in appetite?  1
    8    4          In the last week have you had mood swings?  1
    9    4     In the last week have you had trouble relaxing?  1
    10   5                   When did you last visit a doctor?  0
    

    【讨论】:

    • 简单又好用
    • 非常感谢!棒极了。十分优雅。我发布了一个更新,因为我意识到对于某些嵌套单元格,我需要提取一个额外的项目并将其放入另一列 - 我正在尝试将其取出但我不能。也许你能帮忙?
    • 哦,哇,今天很晚了 - 就是这样,不是吗? d = df2['i2'].str['id_2']
    • 是的!您可以使用str 访问器检索字典中存在的任何值
    猜你喜欢
    • 2022-08-14
    • 2014-04-22
    • 2019-12-22
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-01-21
    • 1970-01-01
    相关资源
    最近更新 更多