【问题标题】:Split a string of category into specific Dataframe columns [duplicate]将一串类别拆分为特定的数据框列[重复]
【发布时间】:2020-12-14 10:54:37
【问题描述】:

我有一个Dataframe 列,包含以下类别:

    data = {'People': ['John','Mary','Andy','April'], 
             'Class': ['Math, Science','English, Math, Science','Math, Science','Science, English, Math']}
    
    df = pd.DataFrame(data, columns = ['People', 'Class'])

如何创建新列并将Dataframe 转换为:

> | People | Math | Science | English |
> ------------------------------------- 
> | John   | Math | Science |         | 
> | Mary   | Math | Science | English | 
> | Andy   | Math | Science |         |
> | April  | Math | Science | English |

【问题讨论】:

标签: python pandas string dataframe


【解决方案1】:
  • 使用.get_dummies 获取Class 列的1 和0 表
  • 使用np.where 将1 替换为列名,将0 替换为空字符串。
  • df.Class.str.get_dummies(', ').apply(lambda x: np.where(x == 1, x.name, '')) 创建一个单独的数据框,我们使用 .join 将其组合回 df
  • .drop Class 列,不需要。
import pandas as pd
import numpy as np

updated = df.join(df.Class.str.get_dummies(', ').apply(lambda x: np.where(x == 1, x.name, ''))).drop(columns=['Class'])

# display(updated)
  People  English  Math  Science
0   John           Math  Science
1   Mary  English  Math  Science
2   Andy           Math  Science
3  April  English  Math  Science

【讨论】:

  • 很好用傻瓜:)
【解决方案2】:

以下代码可能对您有所帮助

columns = set([x for lst in df['Class'] for x in lst.replace(" ", "").split(",") ])
for col in columns:
  df[col] = ""*len(df)

for i, val in enumerate(df["Class"]):
  cl = val.replace(" ", "").split(",")
  print(cl)
  for value in cl:
    df.loc[i][value] = value
df.drop('Class', axis=1, inplace=True)

输出:

    People  Science English Math
0   John    Science         Math
1   Mary    Science English Math
2   Andy    Science         Math
3   April   Science English Math

【讨论】:

    【解决方案3】:

    这是一个解决方案,

    # Strip-out white spaces before `,\s+`, use dummies to create categorical variable
    
    df = df.set_index('People')
    
    dummies = (
        df.Class.str.replace(',\s+', ",", regex=True)
            .str.get_dummies(sep=",")
    )
    
       English  Math  Science
    0        0     1        1
    1        1     1        1
    2        0     1        1
    3        1     1        1
    
    # Create a "hash map" to substitute categorical data
    replace_ = {i : j for i, j in enumerate(dummies.columns, 1)}
    
    # multiply keys with & replace to fill in the column values.
    dummies.mul(list(replace_.keys())).replace(replace_)
    

            English  Math  Science
    People                        
    John          0  Math  Science
    Mary    English  Math  Science
    Andy          0  Math  Science
    April   English  Math  Science
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2023-04-05
      • 2013-04-21
      • 1970-01-01
      • 2018-10-31
      • 1970-01-01
      • 2015-05-14
      • 2018-07-26
      相关资源
      最近更新 更多