【问题标题】:Explode is not working on pandas dataframe爆炸不适用于熊猫数据框
【发布时间】:2023-01-19 17:21:45
【问题描述】:

我有一个包含以下列的数据框

col1 col2       col3            col4            col5
0   HP:0005709  ['HP:0001770']  Toe syndactyly  SNOMEDCT_US:32113001, C0265660
1   HP:0005709  ['HP:0001780']  Abnormality of toe  C2674738
2   EFO:0009136 ['HP:0001507']  Growth abnormality  C0262361

我想爆炸“col4”,我尝试了不同的方法,但没有任何效果。 该列的 dtype 是“object”。

我的尝试如下:

  1. df.explode('cross_ref')

  2. df['cross_ref']=df['cross_ref'].str.split(',') df = df.set_index(['col2']).apply(pd.Series.explode).reset_index()

  3. import ast df[['cross_ref']] = df[['cross_ref']].applymap(ast.literal_eval) df = df.apply(pd.Series.explode)

    预期的输出是:

    col1 col2       col3            col4                col5
    0   HP:0005709  ['HP:0001770']  Toe syndactyly      SNOMEDCT_US:32113001
    0   HP:0005709  ['HP:0001770']  Toe syndactyly      C0265660
    1   HP:0005709  ['HP:0001780']  Abnormality of toe  C2674738
    2   EFO:0009136 ['HP:0001507']  Growth abnormality  C0262361
    

【问题讨论】:

  • 爆炸col5col5 是一个列表吗?试试df.explode('col5')
  • 请重新格式化您的数据框或提供数据框构造函数。 cross_ref栏是col5但是你想爆col4???

标签: python pandas dataframe split explode


【解决方案1】:

您的输入数据令人困惑,因为它只有 4 列有 5 个标题(或者索引是“正常”列?)。为了爆炸col4首先拆分它以将其元素转换为列表,然后爆炸:

df['col4'] = df['col4'].str.split(',s*', regex=True)
df = df.explode('col4')

输出:

          col1            col2                col3                  col4
0   HP:0005709  ['HP:0001770']      Toe syndactyly  SNOMEDCT_US:32113001
0   HP:0005709  ['HP:0001770']      Toe syndactyly              C0265660
1   HP:0005709  ['HP:0001780']  Abnormality of toe              C2674738
2  EFO:0009136  ['HP:0001507']  Growth abnormality              C0262361

【讨论】:

    【解决方案2】:

    IIUC,尝试:

    out = df.assign(**{'col5': df['col5'].str.split(', ')}).explode('col5')
    print(out)
    
    # Output
       col1         col2            col3                col4                  col5
    0     0   HP:0005709  ['HP:0001770']      Toe syndactyly  SNOMEDCT_US:32113001
    0     0   HP:0005709  ['HP:0001770']      Toe syndactyly              C0265660
    1     1   HP:0005709  ['HP:0001780']  Abnormality of toe              C2674738
    2     2  EFO:0009136  ['HP:0001507']  Growth abnormality              C0262361
    

    【讨论】:

      猜你喜欢
      • 2019-01-15
      • 2022-01-27
      • 2021-05-27
      • 2012-12-21
      • 1970-01-01
      • 2019-12-20
      • 2018-08-03
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多