【发布时间】:2018-07-28 08:43:05
【问题描述】:
我尝试通过一个简单的线性回归示例来理解和实现这些算法。我很清楚,全批次梯度下降使用所有数据来计算梯度,而随机梯度下降只使用一个。
全批次梯度下降:
import pandas as pd
from math import sqrt
df = pd.read_csv("data.csv")
df = df.sample(frac=1)
X = df['X'].values
y = df['y'].values
m_current=0
b_current=0
epochs=100000
learning_rate=0.0001
N = float(len(y))
for i in range(epochs):
y_current = (m_current * X) + b_current
cost = sum([data**2 for data in (y-y_current)]) / N
rmse = sqrt(cost)
m_gradient = -(2/N) * sum(X * (y - y_current))
b_gradient = -(2/N) * sum(y - y_current)
m_current = m_current - (learning_rate * m_gradient)
b_current = b_current - (learning_rate * b_gradient)
print("RMSE: ", rmse)
全批梯度下降输出RMSE: 10.597894381512043
现在我尝试在这段代码上实现随机梯度下降,它看起来像这样:
import pandas as pd
from math import sqrt
df = pd.read_csv("data.csv")
df = df.sample(frac=1)
X = df['X'].values
y = df['y'].values
m_current=0
b_current=0
epochs=100000
learning_rate=0.0001
N = float(len(y))
mini = df.sample(n=1) # get one random row from dataset
X_mini = mini['X'].values
y_mini = mini['y'].values
for i in range(epochs):
y_current = (m_current * X) + b_current
cost = sum([data**2 for data in (y-y_current)]) / N
rmse = sqrt(cost)
m_gradient = -(2/N) * (X_mini * (y_mini - y_current))
b_gradient = -(2/N) * (y_mini - y_current)
m_current = m_current - (learning_rate * m_gradient)
b_current = b_current - (learning_rate * b_gradient)
print("RMSE: ", rmse)
输出:RMSE: 27.941268469783633、RMSE: 20.919246260939282、RMSE: 31.100985268167648、RMSE: 21.023479528518386、RMSE: 19.920972478204785...
我使用 sklearn SGDRegressor 得到的结果(设置相同):
import pandas as pd
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from math import sqrt
data= pd.read_csv('data.csv')
x = data.X.values.reshape(-1,1)
y = data.y.values.reshape(-1,1).ravel()
Model = linear_model.SGDRegressor(alpha = 0.0001, shuffle=True, max_iter = 100000)
Model.fit(x,y)
y_predicted = Model.predict(x)
mse = mean_squared_error(y, y_predicted)
print("RMSE: ", sqrt(mse))
输出:RMSE: 10.995881334048224、RMSE: 11.75907544873036、RMSE: 12.981134247509486、RMSE: 12.298263437187988、RMSE: 12.549948073154608...
上面算法得到的结果比scikit模型的结果差。。不知我哪里弄错了?我的算法也很慢(几秒钟)..
【问题讨论】:
-
对于梯度下降算法,在误差的等高线图之上可视化二维空间中
m_current和b_current的进展总是很有帮助的。 -
感谢您的评论,我会记住的
标签: python machine-learning gradient gradient-descent