$E(\theta)$中的E是误差的英语单词Error的首字母。我们的目标就是找到$\theta_0$和$\theta_1$,使得$E(\theta)$的值达到最小。
这里需要用到偏微分来分别求出$\theta_0$和$\theta_1$的更新表达式。
$$ \theta_0 := \theta_0 - \eta \frac{\partial E}{\partial \theta_0} $$ $$ \theta_1 := \theta_1 - \eta \frac{\partial E}{\partial \theta_1} $$$\eta$是学习率,是一个正的常数,读作“伊塔”。根据学习率的大小,到达最小值的更新次数也会发生变化。换种说法就是收敛速度会不同。有时候甚至会出现完全无法收敛,一直发散的情况。
因为$\theta_0$和$\theta_1$在函数$f_\theta(x)$中,需要用到复合函数的微分。
$$ u=E(\theta) $$ $$ v=f_\theta(x) $$ $$ \frac{\partial u}{\partial \theta_0}=\frac{\partial u}{\partial v} \cdot \frac{\partial v}{\partial \theta_0} $$ $$ \frac{\partial u}{\partial \theta_1}=\frac{\partial u}{\partial v} \cdot \frac{\partial v}{\partial \theta_1} $$求u对v的微分
$$ \begin{flalign} \frac{\partial u}{\partial v} &= \frac{\partial}{\partial v}\left( \frac{1}{2}\displaystyle\sum_{i=1}^{n}(y^{(i)}-v)^2 \right) &\\ &=\frac{1}{2}\displaystyle\sum_{i=1}^{n} \left( \frac{\partial}{\partial v}(y^{(i)}-v)^2 \right) &\\ &=\frac{1}{2}\displaystyle\sum_{i=1}^{n} \left( \frac{\partial}{\partial v}(y^{(i)^2}-2y^{(i)}v+v^2) \right) &\\ &=\frac{1}{2}\displaystyle\sum_{i=1}^{n} \left( -2y^{(i)}+2v \right) &\\ &=\displaystyle\sum_{i=1}^{n} \left( v - y^{(i)} \right) \end{flalign} $$
求v对$\theta_0$的微分
$$ \begin{flalign} &\frac{\partial v}{\partial \theta_0} = \frac{\partial}{\partial \theta_0} (\theta_0+\theta_1 x) = 1 &\\ \end{flalign} $$
求v对$\theta_1$的微分
$$ \begin{flalign} &\frac{\partial v}{\partial \theta_1} = \frac{\partial}{\partial \theta_1} (\theta_0+\theta_1 x) = x &\\ \end{flalign} $$
所以
$$ \begin{flalign} \frac{\partial u}{\partial \theta_0} &=\frac{\partial u}{\partial v} \cdot \frac{\partial v}{\partial \theta_0} &\\ &=\displaystyle\sum_{i=1}^{n} \left( v - y^{(i)} \right) \cdot 1 &\\ &=\displaystyle\sum_{i=1}^{n} \left( f_\theta(x^{(i)}) - y^{(i)} \right) \end{flalign} $$ $$ \begin{flalign} \frac{\partial u}{\partial \theta_1} &=\frac{\partial u}{\partial v} \cdot \frac{\partial v}{\partial \theta_1} &\\ &=\displaystyle\sum_{i=1}^{n} \left( v - y^{(i)} \right) \cdot x^{(i)} &\\ &=\displaystyle\sum_{i=1}^{n} \left( f_\theta(x^{(i)}) - y^{(i)} \right) x^{(i)} \end{flalign} $$最后得到的参数$\theta_0$和$\theta_1$的更新表达式。
$$ \theta_0 := \theta_0 - \eta \displaystyle\sum_{i=1}^{n} \left( f_\theta(x^{(i)}) - y^{(i)} \right) \quad \; $$ $$ \theta_1 := \theta_1 - \eta \displaystyle\sum_{i=1}^{n} \left( f_\theta(x^{(i)}) - y^{(i)} \right) x^{(i)} $$