吴恩达机器学习编程作业：logistic 回归（python）

参考代码：
源代码
效果对比
踩的坑 && 总结

参考代码：

https://blog.csdn.net/zy1337602899/article/details/84777396
初次写机器学习的代码，大部分参照了链接文章中的代码，稍有改动

源代码

// An highlighted block
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as opt
from sklearn.metrics import classification_report
import pandas as pd
from sklearn import linear_model

#获取原始数据
def raw_data(path):
    data = pd.read_csv(path, names=['exam1','exam2','admit'])
    return data

#绘制原始数据
def draw_data(data):
    accept = data[data['admit'] == 1]
    refuse = data[data['admit'] == 0]
    plt.scatter(accept['exam1'], accept['exam2'], c='g', label='admit')
    plt.scatter(refuse['exam1'], refuse['exam2'], c='r', label='refuse')
    plt.title('admission')
    plt.xlabel('exam1')
    plt.ylabel('exam2')
    return plt
#sigmoid函数
def sigmoid(z):
    return 1/(1 + np.exp(-z))
#代价函数
def cost_function(theta,x,y):
    m=x.shape[0]
    #print(theta_T.shape)
    first = -y.dot(np.log(sigmoid(x.dot(theta))))
    second = (1 -y).dot(np.log(1 - sigmoid(x.dot(theta))))
    #dot运算的过程中已经做了sum工作
    #因为有部分运算是内积  a1b1+a2b2+...+ anbn
    return (first - second)/m

def gradient_descent(theta,x,y):
    m=x.shape[0]
    b = sigmoid(x.dot(theta))
    #print(b)
    return (b - y).dot(x)
def predict(theta,x):
    h = sigmoid(x.dot(theta.T))
    return[1 if x >= 0.5 else 0 for x in h]
#决策边界
def boundary(theta,data):
    x1 = np.arange(20,100,0.01)

    x2 = -(theta[0] + theta[1]*x1)/theta[2]
    plt = draw_data(data)
    plt.title("boundary")
    plt.plot(x1,x2)
    plt.show()
    pass

def main():
    data=raw_data('F:\jupyterProject\Logistic Regression Exercise\ex2data1.txt')
    # print(data.head())
    # plt=draw_data(data)
    # plt.show()
    x1=data['exam1']
    x2=data['exam2']
    x=np.c_[np.ones(x1.shape[0]),x1,x2]
    y=data['admit']
    # 不要将theta初始化为1，初始化为1的话，h(x)会过大，sigmoid近似返回1，log(1-h(x))无意义
    theta=np.zeros(x.shape[1])
    times = 200000
    learningRate = 0.12
    #print(theta[0].shape)
    cost_data=np.ones(20001)
    #print(gradient_descent(theta,x,y))
    #minimize是已封装好的计算 fun 最小值的方法
    #theta_right=opt.minimize(fun=cost_function,x0=theta,args=(x,y),method='tnc',jac=gradient_descent)
    #循环20万次
    for i in range(times): 
        #cost_data[i] = cost_function(theta,x,y)
        a = gradient_descent(theta,x,y)
        theta= theta - learningRate * a
        #print(a)
    print(theta)
    boundary(theta,data)
    #再循环20万次，对比效果
    for i in range(times): 
        #cost_data[i] = cost_function(theta,x,y)
        a = gradient_descent(theta,x,y)
        theta= theta - learningRate * a
        #print(a)
    print(theta)
    boundary(theta,data)
    #plt.scatter(range(1000),cost_data[:1000], c='k', label='boundary')
main()

效果对比

1.使用 scipy.optimize.minimize方法计算使得costfunction最小的theta，得到如下边界：
theta：[-25.14325329 0.20608863 0.2013236 ]
吴恩达机器学习编程作业：logistic 回归（python）
图片来源：https://blog.csdn.net/zy1337602899/article/details/84777396

2.自己根据课程中老师讲到的repeat部分编写的代码进行的计算（20万次迭代）：
theta：[-73747.32582717 617.25196212 602.09900877]
吴恩达机器学习编程作业：logistic 回归（python）
3.迭代40万次：
theta：[-66277.53154909 571.89400675 561.94143501]

可以看到相比20万次迭代，直观上有了更好的划分效果

踩的坑 && 总结

1.不熟悉numpy中的dot运算，一度以为cost_function和gradient_descent中需要（实际上不需要sum）：

 return sum(first - second)/m
 return sum((b - y).dot(x))

输出部分数据的结构帮助我们理解dot的运算过程

#gradien_descent部分的运算
q = (sigmoid(x.dot(theta))-y)
w = q.dot(x)
print(q.shape)
print(x.shape)
print(w.shape)

output:
(100,)
(100, 3)
(3,)

x是 100×3的矩阵，theta 是个一维行向量，其shape为（3，），我们可以认为 x.dot(theta) 运算时自动把theta变成了 3×1的矩阵，这样得到一个 100×1的矩阵,但是theta实际上不是矩阵，所以得到的结果也不是一个矩阵，而是一个有100个数值的一维行向量，即q。
q是一个一维行向量，x 是100×3的矩阵，这里认为dot运算时把q看作1×100的矩阵，得到一个一维行向量（1×3），当然最后的输出是（3，）格式的数据，这3个数值分别是3个参数的梯度值。由于嫌编辑公式麻烦，具体的由公式推算出代码的过程就附上一份手写的吧。吴恩达机器学习编程作业：logistic 回归（python）

2.关于迭代更新theta
原博中给出的方法是利用已有的scipy.optimize.minimize方法，自己写好损失函数和梯度函数，作为参数传进去，直接获得最优的theta值，但是为了体验一波佛系调参的过程，我尝试着自己写theta的更新代码，关键的代码也就是手写部分的最后一行，一直重复这一过程就行。最初我一直将迭代次数设置在1万至5万，总是出不来好的效果，决策边界偏得离谱，也不停的修改学习率，都是白搭。后来我索性将迭代次数改为了20万次，没想到这次直接出来了比较好的效果，嗯，果然是佛系。
3.尚未解决的问题
我尝试画出cost_function的变化曲线

for i in range(times): 
#每迭代100次记录一次损失函数值
        if(i % 100 == 0):
            cost_data[int(i/100)] = cost_function(theta,x,y)
        a = gradient_descent(theta,x,y)
        theta= theta - learningRate * a
        #print(a)
plt.scatter(range(10000),cost_data[:10000], c='k', label='boundary')

但是得到的结果却是这样的：
吴恩达机器学习编程作业：logistic 回归（python）
很悲伤。
不过程序给报了个warring ：RuntimeWarning: divide by zero encountered in log
那么是不是 sigmoid(x.dot(theta))的结果太小，导致log（）的参数趋近于零了呢，这个问题等我研究明白了再回来填坑，或者有大神知道怎么回事的话还请带一带本小白。