【问题标题】:How to pandas explode a column of a string escaped list into a pandas column of int如何熊猫将字符串转义列表的列爆炸成int的熊猫列
【发布时间】:2022-01-06 02:56:27
【问题描述】:

我参考了 pandas explode doc :#https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.explode.html

此代码适用于字符串。

import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [["1058","1057","1056","1055","1054"], np.nan, np.nan, ["10","57","56","55","54"]],
                   'B': 1,
                   'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})
df.explode('A')

给予

A   B   C
0   1058    1   [a, b, c]
0   1057    1   [a, b, c]
0   1056    1   [a, b, c]
0   1055    1   [a, b, c]
0   1054    1   [a, b, c]
1   NaN 1   NaN
2   NaN 1   []
3   10  1   [d, e]
3   57  1   [d, e]
3   56  1   [d, e]
3   55  1   [d, e]
3   54  1   [d, e]

如何使用包含引号的此数据框获得与上述 A 列相同的分解结果?:

df = pd.DataFrame({'A': [['\"1058\",\"1057\",\"1056\",\"1055\",\"1054\"'], np.nan, np.nan, ['\"10\",\"57\",\"56\",\"55\",\"54\"']],
                   'B': 1,
                   'C': [['a', 'b', 'c'], np.nan, [], ['d', 'e']]})

【问题讨论】:

    标签: pandas explode


    【解决方案1】:

    使用ast,更喜欢eval

    import ast
    
    df['A'] = df.A.apply(lambda x: ast.literal_eval(x[0]) if isinstance(x, list) else x)
    df = df.explode('A')
    print (df)
          A  B          C
    0  1058  1  [a, b, c]
    0  1057  1  [a, b, c]
    0  1056  1  [a, b, c]
    0  1055  1  [a, b, c]
    0  1054  1  [a, b, c]
    1   NaN  1        NaN
    2   NaN  1         []
    3    10  1     [d, e]
    3    57  1     [d, e]
    3    56  1     [d, e]
    3    55  1     [d, e]
    3    54  1     [d, e]
    

    【讨论】:

      【解决方案2】:

      explode之前使用pd.eval

      >>> df.assign(A=df['A'].apply(lambda x: pd.eval(x) if pd.notna(x) and x else x)) \
            .explode('A')
      
            A  B          C
      0  1058  1  [a, b, c]
      0  1057  1  [a, b, c]
      0  1056  1  [a, b, c]
      0  1055  1  [a, b, c]
      0  1054  1  [a, b, c]
      1   NaN  1        NaN
      2   NaN  1         []
      3    10  1     [d, e]
      3    57  1     [d, e]
      3    56  1     [d, e]
      3    55  1     [d, e]
      3    54  1     [d, e]
      

      【讨论】:

      • 感谢@corralien。它让我更接近,但我现在收到一个错误“ValueError:expr 不能是空字符串”
      • 我更新了我的答案。你能检查一下吗。您可以将pd.notna(x) and x 替换为isinstance(x, list)
      猜你喜欢
      • 2018-08-03
      • 2019-01-15
      • 1970-01-01
      • 1970-01-01
      • 2018-06-20
      • 2022-01-25
      • 2018-10-21
      • 2022-10-05
      相关资源
      最近更新 更多