【发布时间】:2017-09-27 14:33:56
【问题描述】:
我编写了一个简单的程序来分类一组线性可分的 2D 随机点。我使用了一个感知器,并用 fit 方法对其进行了训练。现在我想一次训练感知器,每次使用更新的权重绘制超平面(在这种情况下为一条线)。我想要获得的是一个动画,它显示了线条如何变得越来越精确地划分集合。 fit 方法取整个训练集,partial_fit 呢?我可以创建一个循环,每次都用一对新的输入/输出输入方法,并连续读取 coef_ 和 intercept_?
我在这里阅读了http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html 的文档,但我对如何实现它有一些疑问。
编辑 1
感谢 Vivek Kumar,我在我的代码中实现了 partial_fit 方法。该程序创建 2 组坐标,并且对于每对坐标产生一个输出,如果该点在一条线上,则为 1,如果它在线下,则为 -1。该代码适用于 fit 方法,但此版本在数据形状方面存在一些问题。我尝试将 reashape 用于 X 数据而没有任何改进。
import numpy as np
import matplotlib.pyplot as plt
def createLinearSet(nCamp, mTest, qTest):
y_ = []
X_ = np.random.rand(nCamp, 2)*20-10
for n in range(nCamp):
if X_[n][1] >= mTest*X_[n][0]+qTest :
y_.append(1)
else:
y_.append(-1)
return X_, y_
########################################################################
# VARIABLES
iterazioni = 100
eta = 0.6
y = []
error = []
########################################################################
# CREATING DATA SET
m_test = -2
q_test = 3
n_camp = 100
X, y = createLinearSet(n_camp, m_test, q_test)
########################################################################
# 70 % training data and 30 % test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 0)
########################################################################
# Data normalization
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(X_train) # Calcola la media dei campioni e la deviazione standard
X_train_std = sc.transform(X_train) # Normalizza i dati di test e di addestramento
X_test_std = sc.transform(X_test) # NB. uso media e deviazione dei dati di add. per entrambi,
# così sono confrontabili
########################################################################
# Perceptron initialization
from sklearn.linear_model import Perceptron
ppn = Perceptron(n_iter = iterazioni, eta0 = eta, random_state = 0)
########################################################################
# Online training
num_samples = X_train_std.shape[0]
classes_y = np.unique(y_train)
X_train_std = X_train_std.reshape(-1, 2)
for i in range(num_samples):
ppn.partial_fit(X_train_std[i], y_train[i], classes = classes_y )
########################################################################
# Using test data for evaluation
y_pred = ppn.predict(X_test_std)
########################################################################
# Previsions accuracy
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_test, y_pred) * 100
print("Accuracy: {} %".format(round(accuracy,2)))
print(ppn.coef_, ppn.intercept_)
如您所见,问题出在“在线培训”部分。错误是:
/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
根据文档,X 必须是: X : {array-like, sparse matrix}, shape (n_samples, n_features)
如果我打印 X 的单个样本,则输出为: [-0.25547959 -1.4763508]
错误在哪里?
编辑 2
将X_train_std[i].reshape(1,-1) 行放入循环中,它会给我以下信息:
Traceback (most recent call last):
File "Perceptron_Retta_Online.py", line 57, in <module>
ppn.partial_fit(X_train_std[i].reshape(1,-1), y_train[i], classes = classes_y )
File "/usr/local/lib/python3.5/dist-packages/sklearn/linear_model/stochastic_gradient.py", line 512, in partial_fit
coef_init=None, intercept_init=None)
File "/usr/local/lib/python3.5/dist-packages/sklearn/linear_model/stochastic_gradient.py", line 344, in _partial_fit
X, y = check_X_y(X, y, 'csr', dtype=np.float64, order="C")
File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py", line 526, in check_X_y
y = column_or_1d(y, warn=True)
File "/usr/local/lib/python3.5/dist-packages/sklearn/utils/validation.py", line 562, in column_or_1d
raise ValueError("bad input shape {0}".format(shape))
ValueError: bad input shape ()
【问题讨论】:
-
从编程的角度来看,这是非常不清楚的。你可以发布代码,这是有效的。你还应该告诉你想使用哪个来自 sciki-learn 的 Estimator,一些样本数据和结果。你也说过感知器,但链接是关于 SGD 的??
标签: scikit-learn