使用偏导手动计算梯度下降

1. 问题设定和损失函数

假设模型为：
$ \hat{y} = ax + b $

损失函数（均方误差）为：

$ \text{Loss} = (y - \hat{y})2 = (y - (ax + b))2 $

我们使用以下初始参数：

$ a = 0, \quad b = 0 $

并且，学习率 $ \eta = 1 $。

2. 样本数据

我们有 3 个样本，依次输入为：

样本 1: $ (x_1, y_1) = (1, 1) $
样本 2: $ (x_2, y_2) = (0, 1) $
样本 3: $ (x_3, y_3) = (1, 0) $

第一次迭代，使用样本 $ (x_1, y_1) = (1, 1) $

预测值：
$ \hat{y}_1 = a \cdot x_1 + b = 0 \cdot 1 + 0 = 0 $
损失函数：
$ \text{Loss}_1 = (y_1 - \hat{y}_1)2 = (1 - 0)2 = 1 $
计算梯度：
$ \frac{\partial \text{Loss}_1}{\partial a} = -2 \cdot (y_1 - \hat{y}_1) \cdot x_1 = -2 \cdot (1 - 0) \cdot 1 = -2 $
$ \frac{\partial \text{Loss}_1}{\partial b} = -2 \cdot (y_1 - \hat{y}_1) = -2 \cdot (1 - 0) = -2
$
更新参数：
$ a = a - \eta \cdot \frac{\partial \text{Loss}_1}{\partial a} = 0 - 1 \cdot (-2) = 2 $
$ b = b - \eta \cdot \frac{\partial \text{Loss}_1}{\partial b} = 0 - 1 \cdot (-2) = 2 $
更新后：
$ a = 2, \quad b = 2 $

第二次迭代，使用样本 $ (x_2, y_2) = (0, 1) $

预测值：
$ \hat{y}_2 = a \cdot x_2 + b = 2 \cdot 0 + 2 = 2 $
损失函数：
$ \text{Loss}_2 = (y_2 - \hat{y}_2)2 = (1 - 2)2 = 1 $
计算梯度：
$ \frac{\partial \text{Loss}_2}{\partial a} = -2 \cdot (y_2 - \hat{y}_2) \cdot x_2 = -2 \cdot (1 - 2) \cdot 0 = 0 $
$ \frac{\partial \text{Loss}_2}{\partial b} = -2 \cdot (y_2 - \hat{y}_2) = -2 \cdot (1 - 2) = 2 $
更新参数：
$ a = a - \eta \cdot \frac{\partial \text{Loss}_2}{\partial a} = 2 - 1 \cdot 0 = 2 $

$ b = b - \eta \cdot \frac{\partial \text{Loss}_2}{\partial b} = 2 - 1 \cdot 2 = 0 $
更新后：
$ a = 2, \quad b = 0 $

第三次迭代，使用样本 $ (x_3, y_3) = (1, 0) $

预测值：
$ \hat{y}_3 = a \cdot x_3 + b = 2 \cdot 1 + 0 = 2 $
损失函数：
$ \text{Loss}_3 = (y_3 - \hat{y}_3)2 = (0 - 2)2 = 4 $
计算梯度：
$ \frac{\partial \text{Loss}_3}{\partial a} = -2 \cdot (y_3 - \hat{y}_3) \cdot x_3 = -2 \cdot (0 - 2) \cdot 1 = 4 $

$ \frac{\partial \text{Loss}_3}{\partial b} = -2 \cdot (y_3 - \hat{y}_3) = -2 \cdot (0 - 2) = 4 $
4. 更新参数：
$ a = a - \eta \cdot \frac{\partial \text{Loss}_3}{\partial a} = 2 - 1 \cdot 4 = -2 $

$ b = b - \eta \cdot \frac{\partial \text{Loss}_3}{\partial b} = 0 - 1 \cdot 4 = -4 $
更新后：
$ a = -2, \quad b = -4 $

最终结果

经过 3 次迭代，参数更新如下：

第一次迭代后：( a = 2, b = 2 )
第二次迭代后：( a = 2, b = 0 )
第三次迭代后：( a = -2, b = -4 )

这样，我们完成了三次迭代的小批量梯度下降，每次使用一个样本并更新参数。

import torch

# 定义模型参数 a 和 b，并启用梯度
a = torch.tensor([0.0], requires_grad=True)
b = torch.tensor([0.0], requires_grad=True)

# 学习率
learning_rate = 1

# 样本数据全集，定义两个批次
batches = [
    [(1.0, 1.0)],  # 第一个批次，包含两个样本
    [(1.0, 0.0)],
    [(1.0, 1.0)]               # 第二个批次，包含一个样本
]

# 训练过程
for epoch, batch in enumerate(batches):
    # 初始化批次的总损失
    total_loss = 0

    # 对每个样本点计算损失并累加
    for x_input, y_true in batch:
        x = torch.tensor([x_input])
        y = torch.tensor([y_true])

        # 前向传播计算预测值
        y_pred = a * x + b

        # 计算样本的损失并累加
        total_loss += ((y_pred - y) ** 2).mean()

    # 计算批次的平均损失
    batch_loss = total_loss / len(batch)

    # 反向传播计算梯度
    batch_loss.backward()

    # 更新参数 a 和 b
    with torch.no_grad():  # 更新操作不需要计算梯度
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad

    # 打印更新后的参数
    print(f"Epoch {epoch + 1}: batch_size = {len(batch)}, a = {a.item()}, b = {b.item()}")

    # 清除梯度信息
    a.grad.zero_()
    b.grad.zero_()