【发布时间】:2021-06-01 03:27:12
【问题描述】:
我想使用高尔距离实现 pam (KMedoid, method='pam') 算法。
我的数据集包含混合特征,数字和分类,几个猫特征有 1000 多个不同的值。
我在这里找到了合适的高尔距离实现:https://github.com/wwwjk366/gower/blob/master/gower/gower_dist.py
我的问题是我使用的 sklearn-extra implementation of PAM 没有实现 metric='gower' 选项。所以我尝试创建一个可调用对象,但我似乎发现很难将它们连接在一起。
D = gower.gower_matrix(df_ext, cat_features=cat_mask) # cat_mask is a boolean list marking what the
categorical features are in the df_ext
# https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html
def get_gower():
return sklearn.metrics.pairwise_distances(D, metric='precomputed')
# https://scikit-learn-extra.readthedocs.io/en/latest/generated/sklearn_extra.cluster.KMedoids.html
kmedoids = sklearn_extra.cluster.KMedoids(df_ext, metric=get_gower, method='pam')
kmedoids.fit(df_ext)
我得到这个 ValueError:
ValueError Traceback (most recent call last)
<ipython-input-13-9ae677cd636a> in <module>
1 # https://scikit-learn-extra.readthedocs.io/en/latest/generated/sklearn_extra.cluster.KMedoids.html
2 kmedoids = KMedoids(df_ext, metric=get_gower, method='pam')
----> 3 kmedoids.fit(df_ext)
D:\ProgramFiles\anaconda3\lib\site-packages\sklearn_extra\cluster\_k_medoids.py in fit(self, X, y)
183 random_state_ = check_random_state(self.random_state)
184
--> 185 self._check_init_args()
186 X = check_array(X, accept_sparse=["csr", "csc"])
187 if self.n_clusters > X.shape[0]:
D:\ProgramFiles\anaconda3\lib\site-packages\sklearn_extra\cluster\_k_medoids.py in _check_init_args(self)
154
155 # Check n_clusters and max_iter
--> 156 self._check_nonnegative_int(self.n_clusters, "n_clusters")
157 self._check_nonnegative_int(self.max_iter, "max_iter", False)
158
D:\ProgramFiles\anaconda3\lib\site-packages\sklearn_extra\cluster\_k_medoids.py in _check_nonnegative_int(self, value, desc, strict)
144 else:
145 negative = (value is None) or (value < 0)
--> 146 if negative or not isinstance(value, (int, np.integer)):
147 raise ValueError(
148 "%s should be a nonnegative integer. "
D:\ProgramFiles\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1327
1328 def __nonzero__(self):
-> 1329 raise ValueError(
1330 f"The truth value of a {type(self).__name__} is ambiguous. "
1331 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我认为我的可调用对象有问题。你知道我做错了什么吗?
【问题讨论】:
标签: python-3.x scikit-learn cluster-analysis