用Python实现机器学习算法——Softmax 回归算法

Softmax 回归算法，又称为多项式或多类别的 Logistic 回归算法。

给定：

数据集
是d-维向量
对应于的目标变量，例如对于K=3分类问题，

Softmax 回归模型有以下几个特点：

对于每个类别，都存在一个独立的、实值加权向量
这个权重向量通常作为权重矩阵中的行。
对于每个类别，都存在一个独立的、实值偏置量b
它使用 softmax 函数作为其**函数
它使用交叉熵( cross-entropy )作为损失函数

训练 Softmax 回归模型有不同步骤。首先(在步骤0中)，模型的参数将被初始化。在达到指定训练次数或参数收敛前，重复以下其他步骤。

第 0 步：用 0 (或小的随机值)来初始化权重向量和偏置值

第 1 步：对于每个类别k，计算其输入的特征与权重值的线性组合，也就是说为每个类别的训练样本计算一个得分值。对于类别k，输入向量为用Python实现机器学习算法——Softmax 回归算法 ,则得分值的计算如下：

用Python实现机器学习算法——Softmax 回归算法

其中表示类别k的权重矩阵用Python实现机器学习算法——Softmax 回归算法，·表示点积。

我们可以通过矢量化和矢量传播法则计算所有类别及其训练样本的得分值：

用Python实现机器学习算法——Softmax 回归算法

其中 X 是所有训练样本用Python实现机器学习算法——Softmax 回归算法的维度矩阵，W 表示每个类别的权重矩阵维度，其形式为；

第 2 步：用 softmax 函数作为**函数，将得分值转化为概率值形式。 用Python实现机器学习算法——Softmax 回归算法属于类别 k 的输入向量的概率值为：

同样地，我们可以通过矢量化来对所有类别同时处理，得到其概率输出。模型预测出的表示的是该类别的最高概率。

第 3 步：计算整个训练集的损失值。

我们希望模型预测出的高概率值是目标类别，而低概率值表示其他类别。这可以通过以下的交叉熵损失函数来实现：

用Python实现机器学习算法——Softmax 回归算法

在上面公式中，目标类别标签表示成独热编码形式( one-hot )。因此用Python实现机器学习算法——Softmax 回归算法为1时表示的目标类别是 k，反之则为 0。

第 4 步：对权重向量和偏置量，计算其对损失函数的梯度。

关于这个导数实现的详细解释，可以参见这里（http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/）。

一般形式如下：

用Python实现机器学习算法——Softmax 回归算法

对于偏置量的导数计算，此时用Python实现机器学习算法——Softmax 回归算法为1。

第 5 步：对每个类别k，更新其权重和偏置值。

用Python实现机器学习算法——Softmax 回归算法

其中，用Python实现机器学习算法——Softmax 回归算法表示学习率。

In [1]:

from sklearn.datasets import load_iris
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
np.random.seed(13)

数据集

In [2]:

X, y_true = make_blobs(centers=4, n_samples = 5000)
fig = plt.figure(figsize=(8,6))
plt.scatter(X[:,0], X[:,1], c=y_true)
plt.title("Dataset")
plt.xlabel("First feature")
plt.ylabel("Second feature")
plt.show()

用Python实现机器学习算法——Softmax 回归算法

In [3]:

# reshape targets to get column vector with shape (n_samples, 1)
y_true = y_true[:, np.newaxis]
# Split the data into a training and test set
X_train, X_test, y_train, y_test = train_test_split(X, y_true)
print(f'Shape X_train: {X_train.shape}')
print(f'Shape y_train: {y_train.shape}')
print(f'Shape X_test: {X_test.shape}')
print(f'Shape y_test: {y_test.shape}')

Shape X_train: (3750, 2)
Shape y_train: (3750, 1)
Shape X_test: (1250, 2)
Shape y_test: (1250, 1)

Softmax回归分类

class SoftmaxRegressor:
    def __init__(self):
        pass
    def train(self, X, y_true, n_classes, n_iters=10, learning_rate=0.1):
        """
        Trains a multinomial logistic regression model on given set of training data
        """
        self.n_samples, n_features = X.shape
        self.n_classes = n_classes
        self.weights = np.random.rand(self.n_classes, n_features)
        self.bias = np.zeros((1, self.n_classes))
        all_losses = []
        for i in range(n_iters):
            scores = self.compute_scores(X)
            probs = self.softmax(scores)
            y_predict = np.argmax(probs, axis=1)[:, np.newaxis]
            y_one_hot = self.one_hot(y_true)
            loss = self.cross_entropy(y_one_hot, probs)
            all_losses.append(loss)
            dw = (1 / self.n_samples) * np.dot(X.T, (probs - y_one_hot))
            db = (1 / self.n_samples) * np.sum(probs - y_one_hot, axis=0)
            self.weights = self.weights - learning_rate * dw.T
            self.bias = self.bias - learning_rate * db
            if i % 100 == 0:
                print(f'Iteration number: {i}, loss: {np.round(loss, 4)}')
        return self.weights, self.bias, all_losses
    def predict(self, X):
        """
        Predict class labels for samples in X.
        Args:
            X: numpy array of shape (n_samples, n_features)
        Returns:
            numpy array of shape (n_samples, 1) with predicted classes
        """
        scores = self.compute_scores(X)
        probs = self.softmax(scores)
        return np.argmax(probs, axis=1)[:, np.newaxis]
    def softmax(self, scores):
        """
        Tranforms matrix of predicted scores to matrix of probabilities
        Args:
            scores: numpy array of shape (n_samples, n_classes)
            with unnormalized scores
        Returns:
            softmax: numpy array of shape (n_samples, n_classes)
            with probabilities
        """
        exp = np.exp(scores)
        sum_exp = np.sum(np.exp(scores), axis=1, keepdims=True)
        softmax = exp / sum_exp
        return softmax
    def compute_scores(self, X):
        """
        Computes class-scores for samples in X
        Args:
            X: numpy array of shape (n_samples, n_features)
        Returns:
            scores: numpy array of shape (n_samples, n_classes)
        """
        return np.dot(X, self.weights.T) + self.bias
    def cross_entropy(self, y_true, scores):
        loss = - (1 / self.n_samples) * np.sum(y_true * np.log(scores))
        return loss
    def one_hot(self, y):
        """
        Tranforms vector y of labels to one-hot encoded matrix
        """
        one_hot = np.zeros((self.n_samples, self.n_classes))
        one_hot[np.arange(self.n_samples), y.T] = 1
        return one_hot

初始化并训练模型

regressor = SoftmaxRegressor()
w_trained, b_trained, loss = regressor.train(X_train, y_train, learning_rate=0.1, n_iters=800, n_classes=4)
fig = plt.figure(figsize=(8,6))
plt.plot(np.arange(800), loss)
plt.title("Development of loss during training")
plt.xlabel("Number of iterations")
plt.ylabel("Loss")
plt.show()Iteration number: 0, loss: 1.393

Iteration number: 100, loss: 0.2051
Iteration number: 200, loss: 0.1605
Iteration number: 300, loss: 0.1371
Iteration number: 400, loss: 0.121
Iteration number: 500, loss: 0.1087
Iteration number: 600, loss: 0.0989
Iteration number: 700, loss: 0.0909

用Python实现机器学习算法——Softmax 回归算法

测试模型

n_test_samples, _ = X_test.shape
y_predict = regressor.predict(X_test)
print(f"Classification accuracy on test set: {(np.sum(y_predict == y_test)/n_test_samples) * 100}%")