梯度下降算法错误不一致的参数答案

【问题标题】：Gradient descent algorithm error non-comformable arguments梯度下降算法错误不一致的参数
【发布时间】：2017-09-11 16:30:14
【问题描述】：

我正在尝试在 R 中从头开始执行线性回归，而不使用任何包或库。我使用的数据是：

UCI 机器学习存储库，自行车共享数据集

我必须对这个回归应用批量更新梯度下降算法。

我编写了以下代码：

> # Load the data
> data <- read.csv("Bike-Sharing-Dataset/hour.csv")
> 
> # Select the useable features
> data1 <- data[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed", "cnt")]
> 
> # Examine the data structure
> str(data1)
'data.frame':   17379 obs. of  12 variables:
 $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ mnth      : int  1 1 1 1 1 1 1 1 1 1 ...
 $ hr        : int  0 1 2 3 4 5 6 7 8 9 ...
 $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
 $ weekday   : int  6 6 6 6 6 6 6 6 6 6 ...
 $ workingday: int  0 0 0 0 0 0 0 0 0 0 ...
 $ weathersit: int  1 1 1 1 1 2 1 1 1 1 ...
 $ temp      : num  0.24 0.22 0.22 0.24 0.24 0.24 0.22 0.2 0.24 0.32 ...
 $ atemp     : num  0.288 0.273 0.273 0.288 0.288 ...
 $ hum       : num  0.81 0.8 0.8 0.75 0.75 0.75 0.8 0.86 0.75 0.76 ...
 $ windspeed : num  0 0 0 0 0 0.0896 0 0 0 0 ...
 $ cnt       : int  16 40 32 13 1 1 2 3 8 14 ...
> 
> summary(data1)
     season           mnth              hr           holiday           weekday        workingday       weathersit   
 Min.   :1.000   Min.   : 1.000   Min.   : 0.00   Min.   :0.00000   Min.   :0.000   Min.   :0.0000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.: 4.000   1st Qu.: 6.00   1st Qu.:0.00000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:1.000  
 Median :3.000   Median : 7.000   Median :12.00   Median :0.00000   Median :3.000   Median :1.0000   Median :1.000  
 Mean   :2.502   Mean   : 6.538   Mean   :11.55   Mean   :0.02877   Mean   :3.004   Mean   :0.6827   Mean   :1.425  
 3rd Qu.:3.000   3rd Qu.:10.000   3rd Qu.:18.00   3rd Qu.:0.00000   3rd Qu.:5.000   3rd Qu.:1.0000   3rd Qu.:2.000  
 Max.   :4.000   Max.   :12.000   Max.   :23.00   Max.   :1.00000   Max.   :6.000   Max.   :1.0000   Max.   :4.000  
      temp           atemp             hum           windspeed           cnt       
 Min.   :0.020   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :  1.0  
 1st Qu.:0.340   1st Qu.:0.3333   1st Qu.:0.4800   1st Qu.:0.1045   1st Qu.: 40.0  
 Median :0.500   Median :0.4848   Median :0.6300   Median :0.1940   Median :142.0  
 Mean   :0.497   Mean   :0.4758   Mean   :0.6272   Mean   :0.1901   Mean   :189.5  
 3rd Qu.:0.660   3rd Qu.:0.6212   3rd Qu.:0.7800   3rd Qu.:0.2537   3rd Qu.:281.0  
 Max.   :1.000   Max.   :1.0000   Max.   :1.0000   Max.   :0.8507   Max.   :977.0  
> 
> x0 <- rep(1, nrow(data1)) # column of 1's
> x1 <- data1[, c("season", "mnth", "hr", "holiday", "weekday", "workingday", "weathersit", "temp", "atemp", "hum", "windspeed")]
> # create the x- matrix of explanatory variables
> x <- as.matrix(cbind(x0,x1))
> 
> # create the y-matrix of dependent variables
> 
> y <- as.matrix(data1$cnt)
> m <- nrow(y)
> 
> solve(t(x)%*%x)%*%t(x)%*%y 
                   [,1]
x0           29.1810525
season       18.9876496
mnth          0.1589082
hr            7.4613187
holiday     -20.5845740
weekday       1.7134883
workingday    3.6982194
weathersit   -1.3296468
temp         93.0022705
atemp       227.1855491
hum        -222.1211201
windspeed    28.4864449
> 
> # define the gradient function dJ/dtheata: 1/m * (h(x)-y))*x where h(x) = x*theta
> # in matrix form this is as follows:
> grad <- function(x, y, theta) {
+   gradient <- (1/m)* (t(x) %*% ((x %*% t(theta)) - y))
+   return(t(gradient))
+ }
> # define gradient descent update algorithm
> grad.descent <- function(x, maxit){
+   theta <- matrix(c(0, 0), nrow=1) # Initialize the parameters
+   
+   alpha = .05 # set learning rate
+   for (i in 1:maxit) {
+     theta <- theta - alpha  * grad(x, y, theta)   
+   }
+   return(theta)
+ }

当我尝试调用函数并打印梯度下降的结果时，我收到以下错误：

> print(grad.descent(x,1000))
 Show Traceback

 Rerun with Debug
 Error in x %*% t(theta) : non-conformable arguments 

> beta <- grad.descent(x,1000)
Error in x %*% t(theta) : non-conformable arguments

这是什么意思，我该如何解决？

【问题讨论】：

您将 theta 初始化为长度为 2 的向量。这似乎是错误的维度。模型中有十二个变量！
@coffeinjunky 那么向量长度应该是 12 吗？即-theta <- matrix(c(0, 0,0,0,0,0,0,0,0,0,0,0), nrow=1) 或theta <- matrix(c(0, 0), nrow=12) 我是新手，不太确定该怎么做：/
我尝试了第一种方法，它只是为所有列返回'NaN'

标签： r algorithm gradient-descent

【解决方案1】：

尝试以下方法：

grad.descent <- function(x, maxit){
  theta <- matrix(rep(0, length=ncol(x)), nrow = 1) 
  alpha = .05 # set learning rate
  for (i in 1:maxit) {
    theta <- theta - alpha  * grad(x, y, theta)   
  }
  return(theta)
}

grad.descent(x,10)
               x0       season          mnth            hr
[1,] -14980121331 -39045685399 -103624114379 -217515123951
        holiday      weekday   workingday   weathersit        temp
[1,] -428141889 -45772773208 -10250464667 -21311163894 -7687568533
           atemp         hum   windspeed
[1,] -7340863806 -9108715961 -2927915227

错误non-conformable arguments 几乎总是表明您的矩阵的某些维度不匹配。在本例中，您将 theta 初始化为维度为 (1,2) 的矩阵，但您有 12 个变量。

在相关说明中，您的步长相当大，这就是为什么您最终可能会得到奇怪的结果。要查看这一点，让我们使用以下代码：

grad <- function(x, y, theta) { # note that for readability, I redefined theta as a column vector
  gradient <-  1/m* t(x) %*% (x %*% theta - y) 
  return(gradient)
}
grad.descent <- function(x, maxit, alpha){
  theta <- matrix(rep(0, length=ncol(x)), ncol = 1)
  for (i in 1:maxit) {
    theta <- theta - alpha  * grad(x, y, theta)   
  }
  return(theta)
}

让我们用 0.05 的 alpha 和 0.005 来做吧：

data.frame(alpha_0.05 = grad.descent(x, maxit = 1000, alpha = 0.05),
           alpha_0.005 = grad.descent(x, maxit = 1000, alpha = 0.005))
           alpha_0.05 alpha_0.005
x0                NaN    6.253737
season            NaN   31.968743
mnth              NaN   -2.317199
hr                NaN    9.904181
holiday           NaN   -2.986200
weekday           NaN    2.982280
workingday        NaN    8.961909
weathersit        NaN  -26.145486
temp              NaN   46.509991
atemp             NaN   41.258458
hum               NaN  -29.508986
windspeed         NaN    7.632146

【讨论】：

谢谢，这行得通-但我担心的一件事是梯度下降的结果与回归的结果完全不同，这是为什么呢？另外，我收到一条警告消息> warnings() Warning messages: 1: In (x %*% t(theta)) - y : longer object length is not a multiple of shorter object length
太棒了！警告信息消失了——但是，我的结果仍然与回归的结果完全不同。你能解释这是为什么吗？
maxit = 100000 和 alpha = 0.005 非常接近。要获得更强大的版本，您可以使用两次更新之间的成本差异来实现收敛标准。
如何实现收敛标准？如果您在这里看到我的问题：stackoverflow.com/questions/46163492/…，我认为第一个算法具有收敛标准，但我不确定如何在该算法中实现相同的想法