【问题标题】:How to implement MultiLabelBinarizer on this dataframe?如何在此数据帧上实现 MultiLabelBinarizer?
【发布时间】:2019-11-11 12:59:14
【问题描述】:

我有一个这样的数据框:

    mid value   label
ID          
192 3   176.6   [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 4   73.6    [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 5   15.8    [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
194 3   9603.2  [0, 0, 0, 0, 0, 9, 6, 1, 8, ...

我想在删除每个标签列列表中的重复值后实现 MultiLabelBinarizer。

我尝试循环框架并删除重复项。而且,多标签二值化器不起作用并引发异常

    from sklearn.preprocessing import MultiLabelBinarizer
    mlb = MultiLabelBinarizer()
    mlb.fit(y_train.data)
    X_train includes the mid and value columns
    y_train includes label values
    id is the index

I expect a prediction from the above values after the duplicate values are removed from each list of label column

【问题讨论】:

标签: python-3.x machine-learning scikit-learn scikit-multilearn


【解决方案1】:

假设您的数据框名为df

df2 = pd.DataFrame(df.groupby(['ID','mid', 'value'])['label'].apply(lambda x: tuple(x.values)))
df2.reset_index(inplace=True)

from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit(df2['label'])
mlb.transform(df2['label'])

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2019-11-14
    • 2023-03-11
    • 2021-11-16
    • 1970-01-01
    • 1970-01-01
    • 2019-01-24
    • 2020-11-14
    • 2014-05-21
    相关资源
    最近更新 更多