1. 问题设定和损失函数
假设模型为:
$ \hat{y} = ax + b $
损失函数(均方误差)为:
$ \text{Loss} = (y - \hat{y})2 = (y - (ax + b))2 $
我们使用以下初始参数:
$ a = 0, \quad b = 0 $
并且,学习率 $ \eta = 1 $。
2. 样本数据
我们有 3 个样本,依次输入为:
- 样本 1: $ (x_1, y_1) = (1, 1) $
- 样本 2: $ (x_2, y_2) = (0, 1) $
- 样本 3: $ (x_3, y_3) = (1, 0) $
**第一次迭代,使用样本 **$ (x_1, y_1) = (1, 1) $
预测值:
$ \hat{y}_1 = a \cdot x_1 + b = 0 \cdot 1 + 0 = 0 $损失函数:
$ \text{Loss}_1 = (y_1 - \hat{y}_1)2 = (1 - 0)2 = 1 $计算梯度:
$ \frac{\partial \text{Loss}_1}{\partial a} = -2 \cdot (y_1 - \hat{y}_1) \cdot x_1 = -2 \cdot (1 - 0) \cdot 1 = -2 $
$ \frac{\partial \text{Loss}_1}{\partial b} = -2 \cdot (y_1 - \hat{y}_1) = -2 \cdot (1 - 0) = -2
$更新参数:
$ a = a - \eta \cdot \frac{\partial \text{Loss}_1}{\partial a} = 0 - 1 \cdot (-2) = 2 $
$ b = b - \eta \cdot \frac{\partial \text{Loss}_1}{\partial b} = 0 - 1 \cdot (-2) = 2 $
更新后:
$ a = 2, \quad b = 2 $
第二次迭代,使用样本 $ (x_2, y_2) = (0, 1) $
- 预测值:
$ \hat{y}_2 = a \cdot x_2 + b = 2 \cdot 0 + 2 = 2 $ - 损失函数:
$ \text{Loss}_2 = (y_2 - \hat{y}_2)2 = (1 - 2)2 = 1 $ - 计算梯度:
$ \frac{\partial \text{Loss}_2}{\partial a} = -2 \cdot (y_2 - \hat{y}_2) \cdot x_2 = -2 \cdot (1 - 2) \cdot 0 = 0 $
$ \frac{\partial \text{Loss}_2}{\partial b} = -2 \cdot (y_2 - \hat{y}_2) = -2 \cdot (1 - 2) = 2 $ - 更新参数:
$ a = a - \eta \cdot \frac{\partial \text{Loss}_2}{\partial a} = 2 - 1 \cdot 0 = 2 $
$ b = b - \eta \cdot \frac{\partial \text{Loss}_2}{\partial b} = 2 - 1 \cdot 2 = 0 $
更新后:
$ a = 2, \quad b = 0 $
第三次迭代,使用样本 $ (x_3, y_3) = (1, 0) $
- 预测值:
$ \hat{y}_3 = a \cdot x_3 + b = 2 \cdot 1 + 0 = 2 $ - 损失函数:
$ \text{Loss}_3 = (y_3 - \hat{y}_3)2 = (0 - 2)2 = 4 $ - 计算梯度:
$ \frac{\partial \text{Loss}_3}{\partial a} = -2 \cdot (y_3 - \hat{y}_3) \cdot x_3 = -2 \cdot (0 - 2) \cdot 1 = 4 $
$ \frac{\partial \text{Loss}_3}{\partial b} = -2 \cdot (y_3 - \hat{y}_3) = -2 \cdot (0 - 2) = 4 $
4. 更新参数:
$ a = a - \eta \cdot \frac{\partial \text{Loss}_3}{\partial a} = 2 - 1 \cdot 4 = -2 $
$ b = b - \eta \cdot \frac{\partial \text{Loss}_3}{\partial b} = 0 - 1 \cdot 4 = -4 $
更新后:
$ a = -2, \quad b = -4 $
最终结果
经过 3 次迭代,参数更新如下:
- 第一次迭代后:( a = 2, b = 2 )
- 第二次迭代后:( a = 2, b = 0 )
- 第三次迭代后:( a = -2, b = -4 )
这样,我们完成了三次迭代的小批量梯度下降,每次使用一个样本并更新参数。
1 | import torch |