【问题标题】:Reversing a MultiLabelBinarizer to create a list within a column反转 MultiLabelBinarizer 以在列中创建列表
【发布时间】:2020-09-20 06:22:06
【问题描述】:

在 Python3 中,我有一个多标签二进制数据格式的起始数据框:

df1:

"a" "b" "c" "d" "e"

 1   1   0   0   1
 0   0   1   0   1
 1   0   0   0   0
 0   1   1   0   1

我需要实现的是:

df2:

"a" "b" "c" "d" "e" "labels"

 1   1   0   0   1   ["a", "b", "e"]
 0   0   1   0   1   ["c", "e"]
 1   0   0   0   0   ["a"]
 0   1   1   0   1   ["b", "c", "e"]

首先,我尝试使用来自 sklearn 的 MultiLabelBinarizer 的 inverse_transform() 函数,该函数基于之前的堆栈 question

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit(df1.columns)
mlb.inverse_transform(df1.values)

ValueError: Expected indicator for 15 classes, but got 5

我尝试遵循来自 sklearn 的确切文档,但我不确定我哪里出错了。我尝试调整了一些参数,但我不明白问题是什么。

【问题讨论】:

  • 姗姗来迟的另一种方法

标签: python-3.x pandas scikit-learn


【解决方案1】:
df2=df.apply(lambda x:x>0)# come up with a boolean dataframe

l=df.columns.to_numpy() put column names into a numpy array

#Calculate column `labels` using list comprehension in a `pd.DataFrame()` method.
df['labels']=pd.DataFrame({'a':[l[i] for i in df2.to_numpy()]})

【讨论】:

    【解决方案2】:

    一种 Numpy 方法

    i, j = np.where(df)
    a = df.columns.to_numpy()[j]
    b = np.flatnonzero(np.diff(i)) + 1
    df.assign(labels=np.split(a, b))
    
       a  b  c  d  e     labels
    0  1  1  0  0  1  [a, b, e]
    1  0  0  1  0  1     [c, e]
    2  1  0  0  0  0        [a]
    3  0  1  1  0  1  [b, c, e]
    

    【讨论】:

      【解决方案3】:

      让我们试试dotstr.split

      df['labels'] = df.dot(df.columns+',').str[:-1].str.split(',')
      0    ["a", "b", "e"]
      1         ["c", "e"]
      2              ["a"]
      3    ["b", "c", "e"]
      dtype: object
      

      【讨论】:

      • df['labels'] = [[*s] for s in df @ df.columns] ;-)
      • 或:df['labels'] = [*map(list, df @ df.columns)]
      【解决方案4】:

      你可以stack数据,过滤值,分组:

      df['labels'] = (df.stack()
         .loc[lambda x: x>0]
         .reset_index()
         .groupby('level_0')
         .agg({'level_1':list})
      )
      

      输出:

         "a"  "b"  "c"  "d"  "e"           labels
      0    1    1    0    0    1  ["a", "b", "e"]
      1    0    0    1    0    1       ["c", "e"]
      2    1    0    0    0    0            ["a"]
      3    0    1    1    0    1  ["b", "c", "e"]
      

      【讨论】:

        猜你喜欢
        • 2017-01-29
        • 2020-02-19
        • 2015-10-14
        • 1970-01-01
        • 2012-09-26
        • 1970-01-01
        • 2013-10-28
        • 2014-03-19
        相关资源
        最近更新 更多