【发布时间】:2019-11-11 12:59:14
【问题描述】:
我有一个这样的数据框:
mid value label
ID
192 3 176.6 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 4 73.6 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
192 5 15.8 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19...
194 3 9603.2 [0, 0, 0, 0, 0, 9, 6, 1, 8, ...
我想在删除每个标签列列表中的重复值后实现 MultiLabelBinarizer。
我尝试循环框架并删除重复项。而且,多标签二值化器不起作用并引发异常
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
mlb.fit(y_train.data)
X_train includes the mid and value columns
y_train includes label values
id is the index
I expect a prediction from the above values after the duplicate values are removed from each list of label column
【问题讨论】:
-
这是数据框的格式。 192 3 176.6 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19... 1 192 4 73.6 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19 ... 192 5 15.8 [9, 6, 8, 0, 8, 8, 7, 9, 2, 19... 194 3 9603.2 [0, 0, 0, 0, 0, 9, 6, 1, 8 , ...
标签: python-3.x machine-learning scikit-learn scikit-multilearn