【发布时间】:2019-09-12 15:59:34
【问题描述】:
# Reading the dataset
data= pd.read_csv("energydata_complete.csv")
X = data.drop(['Appliances','date'], axis=1)
Y = data['Appliances'].values
y_np = Y.shape[0]
y = Y.reshape(y_np,1)
L = 0.001
n = X.shape[1]
m = y.size
ones = np.ones((X.shape[0],1))
X = np.concatenate((ones,X), axis = 1)
X = X.astype("float64")
y = y.astype("float64")
thetas = np.random.random([1,n+1])
# Splitting data into training and testing
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =
0.25, random_state = 5)
# Partition the dataset in train + validation sets
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
def batch_grad_desc(X,y,m,thetas,L):
cost_list = []
thetas_list = []
L = 0.001
L = []
m = y.size
y_hat_list = []
grad = True
i = 0
cost_list.append (1e10)
while grad:
y_hat = np.dot(X,thetas.T)
y_hat_list.append(y_hat)
Error = (y_hat - y)
cost = 1/(2*m) * np.dot(Error.T, Error)
cost_list.append(cost)
thetas = thetas - (L*(1/m) * np.dot(X.T, Error))
thetas_list.append(thetas)
if cost_list - cost_list[i+1] < 1e-9:
grad = False
i+= 1
cost_list.pop(0)
return y_hat_list, cost_list, thetas_list
y_hat_list, cost_list, thetas_list = batch_grad_desc(X, y, m, thetas, L)
thetas = thetas_list[-1]
错误:
TypeErrorTraceback (most recent call last)
<ipython-input-66-5b85a5574e32> in <module>
----> 1 y_hat_list, cost_list, thetas_list = batch_grad_desc(X, y, m, `thetas, L)`
2 thetas = thetas_list[-1]
<ipython-input-65-9097ee62fbd8> in batch_grad_desc(X, y, m, thetas, L)
19
20 cost_list.append(cost)
---> 21 thetas = thetas - (L*(1/m) * np.dot(X.T, Error))
22
23 thetas_list.append(thetas)
TypeError: can't multiply sequence by non-int of type 'float'
我已经实现了一个基本的批处理梯度函数,但现在我在更新 theta 时遇到了错误,我无法猜测如何使它工作。变量已经在浮点数中声明。不知道是什么错误?
【问题讨论】:
标签: python numpy linear-regression gradient-descent