跳过主要导航

What is Gradient Descent?

Gradient Descent is an optimising algorithm used in Deep Learning algorithms. The goal of Gradient Descent is to minimise the objective convex function f(x) using iteration.
二进制隧道
©约克大学

Gradient Descent is an optimising algorithm used in Deep Learning algorithms. The goal of Gradient Descent is to minimise the objective convex function f(x) using iteration.

是时候让我们的手变得肮脏并深入了解我们如何实际优化参数以从培训数据中学习的理论细节了。从数学角度来看,这是非常具有挑战性的。但是,好消息是,它完全基于A级标准数学,并可能为您提供一些灵感,即您可能研究的纯数学实际上具有一些非常重要的实用应用!

首先,让我们记住我们要做的事情。我们的MLP有很多参数(每个神经元的重量和偏见)。我们将所有这些将所有这些称为(MathBf {W})(您可以想到在长列表中写所有参数,即(MathBf {W})是所有参数的向量)。现在,回想一下目标是解决以下优化问题:

Find the (mathbf{w}) minimises (L(mathbf{w})) where

[L(mathbf{w}) = sum_{i=1}^n E(f_{mathbf{w}}(x_i) ,y_i)]

(e)是我们的损失函数,(l)总损失,(y_i)标签,(x_i)输入和(f_ {mathbf {w}})代表我们的参数函数,例如mlp。我们将通过随机值对(MathBf {W})进行初始猜测。事实证明,您选择这些随机值的精确方式很重要,现在有一些标准技巧可以做到这一点,但是这些细节对我们目前无关紧要。

Now, we need to know how to change our current estimate of (mathbf{w}) so that the loss reduces. If we can keep doing this, we’ll gradually improve the performance of our MLP. An adjustment to w that reduces the loss is called a下降方向。We’ll now see the simplest way to compute such a direction.

Gradient Descent

We will use as our descent direction the坡度of our loss with respect to the parameters of our MLP. In case you’ve never heard of a gradient, this is a generalisation of thederivative当您有多个输入时。

让我们从一个非常简单的例子开始。假设您有功能:

[z(x,y)= x^2 + xy + y^2]

此功能具有两个输入:X和Y。我们可以假装它只有一个输入(例如(x)),将另一个输入视为恒定值,然后进行标准差异。我们称之为partial derivative并使用看起来有点像d的符号来写它:

[frac {部分z} {partial x}(x,y)= 2x + y]

Hopefully that looks easy to you from differentiation you may have studied before. Now let’s write down the other partial derivative (with respect to y this time):

[frac {部分z} {部分y}(x,y)= 2y + x]

Finally, we can put these partial derivatives together into a vector that we call the gradient and denote by an upside down triangle:

[nabla z(x,y) = left[ frac{partial z}{partial x}(x,y), frac{partial z}{partial y}(x,y) right] = [2x + y, 2y +x]]

Now, here’s the clever part. The gradient effectively tells us which direction is “uphill” at any given point. In other words, what small change would we need to make to our parameters to increase the function value. Since we want to降低损失,我们将朝相反的方向,即梯度的负。反复沿负梯度方向踏入坡度descent

假设在迭代后,我们对参数(x_t)和(y_t)的估计值进行了估计。为了在下一个迭代(t+1)上获取估计值,该函数值将具有较小的功能值,我们要做:

[left[x_{t+1}, y_{t+1}right] = left[x_t, y_tright] – gammanabla z(x_t,y_t)]

The value (gamma) is called the一步的大小或(更常见的是在机器学习中)learning rate。This will determine how fast our optimisation reduces the loss. Learning rate is ahyperparameter- 我们需要手动选择会影响系统培训的价值。太大了,我们会超越。太小,可能需要很长时间才能达到一个好的解决方案。如果我们找到梯度为所有零的地方,我们可以停止,因为我们找到了一个当地的最低限度。在这个地方,无法通过沿任何方向移动较小的距离来进一步减少损失。

So, we now know how to do gradient descent for a very simple function.

©约克大学
This article is from the free online

智能系统:深度学习和自主系统简介

Created by
Futurelearn- Learning For Life

我们的目的是改变接受教育的机会。

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
您可以通过订阅我们无限制的包裹来解锁新的机会,以无限制地访问数百种在线短课程。电竞博彩app有什么通过顶尖的大学和组织建立知识。dota2竞猜吧

Learn more about how FutureLearn is transforming access to education