LinearSVC 中参数 class_weight 的最佳值是多少？答案

【问题标题】：What is the best value for the parameter class_weight in LinearSVC?LinearSVC 中参数 class_weight 的最佳值是多少？
【发布时间】：2020-03-13 01:26:53
【问题描述】：

我有一个多标签数据（一些类有 2 个和一些 10 个标签），我的模型过度拟合平衡值和无值。为 class_weight 参数设置的最佳值是什么。

from sklearn.svm import LinearSVC
svm = LinearSVC(C=0.01,max_iter=100,dual=False,class_weight=None,verbose=1)

【问题讨论】：

标签： python scikit-learn svm libsvm scikit-multilearn

【解决方案1】：

class_weight 参数实际上通过以下方式控制C 参数：

class_weight : {dict, ‘balanced’}, 可选的

设置参数C 对于 SVC，第 i 类到 class_weight[i]*C。如果没有给出，所有的类都是应该是重量一。 “平衡”模式使用 y 的值自动调整与类别成反比的权重输入数据中的频率为n_samples / (n_classes * np.bincount(y))

尝试使用class_weight，同时保持C 相同，例如C=0.1

编辑

这是为您的 171 个班级创建 class_weight 的绝妙方法。

# store the weights for each class in a list
weights_per_class = [2,3,4,5,6]

#Let's assume that you have a `y` like this:
y = [121, 122, 123, 124, 125]

您需要：

# create the `class_weight` dictionary
class_weight = {val:weights_per_class[index] for index,val in enumerate (y)}

print(class_weight)
#{121: 2, 122: 3, 123: 4, 124: 5, 125: 6}

# Use it as argument
svm = LinearSVC(class_weight=class_weight)

【讨论】：

我的数据集中有 171 个类。如何为所有类设置？你能给我举个例子
您的具有类标签的y 从0 或1 开始？换句话说，第一类的标签是 0 还是 1 ？我需要根据您的回答更新我的回答
好吧，我有像 121,122,123 这样的标签。不是连续的，但我的标签是这样的。总共是 171。向权重较小的标签添加更多权重是否与过采样相同？如果不是，有什么区别？
查看我的更新答案。考虑接受和支持
当然。你能不能考虑一下这个问题：stackoverflow.com/questions/58991545/…