WEBVTT

00:02.120 --> 00:07.190
Hello and welcome to this tutorial here we Villone gradient descent.

00:07.400 --> 00:08.300
Let us begin.

00:09.960 --> 00:16.940
Below, we have understood there are two key aspects with which a neural network loans and these are

00:17.150 --> 00:19.940
activation function and cost function.

00:21.440 --> 00:23.990
We are still missing a key step here.

00:24.200 --> 00:28.100
That is the actual learning process of neural network.

00:28.640 --> 00:32.750
To understand the learning process, b, how to understand the gradient descent saying.

00:35.840 --> 00:43.220
The gradient descent is an optimization algorithm that is used for minimizing the cost function, minimizing

00:43.220 --> 00:46.640
the cost function means are minimizing the error.

00:48.020 --> 00:54.980
The gradient descent update the various parameter of a machine learning model to minimize the cost function.

00:55.640 --> 01:00.290
Let us understand how the gradient descent words and minimize the cost function.

01:03.430 --> 01:11.050
As you can see here in this diagram on the x axis V o Vadis and on the Y Axis, B help cost function,

01:11.500 --> 01:14.590
and we're considering this in only one dimension.

01:15.880 --> 01:19.570
So here we have taken a random value off cost function.

01:20.020 --> 01:22.700
Now we have to minimize that cost function.

01:22.900 --> 01:26.080
And to do that, we have to apply the gradient.

01:26.770 --> 01:31.830
Applying the gradient means we have to take the derivative of that value here.

01:33.520 --> 01:37.900
The gradient descent is applied here to minimize the cost function.

01:39.460 --> 01:43.350
After applying, the gradient cost function is minimizing now.

01:45.250 --> 01:49.930
And at the end of this process, we will get an optimized cost function.

01:50.500 --> 01:55.300
So this is the pictorial representation of gradient descent in one time in John.

01:58.300 --> 02:03.570
Finding the minimum value of a cost function looks very simple in Vandam InGen.

02:03.940 --> 02:07.540
But in other cases, they will have multiple parameters.

02:07.840 --> 02:10.600
That is, we will have multiple dimensions.

02:11.940 --> 02:18.860
So we will use the built-In linear algebra libraries in deep learning to minimize the cost function.

02:22.020 --> 02:29.460
After understanding the gradient descent, we have to understand back propagation using gradient descent,

02:29.510 --> 02:34.110
we can figure out the best parameters for minimizing the cost function.

02:35.400 --> 02:43.020
Now the question arises that how can we adjust the optimal parameters or vapes across the entire network?

02:44.400 --> 02:47.370
And to do that, we used back propagation.

02:50.420 --> 02:57.390
Back propagation is used to calculate the error contribution of each neuron after a batch of data is

02:57.440 --> 02:58.160
processed.

02:59.360 --> 03:02.450
Back propagation calculate the error at output.

03:02.630 --> 03:07.990
And after that, it distributes that output back throughout the network layers.

03:09.350 --> 03:14.060
To do that, it requires a non desired output for each input value.

03:14.720 --> 03:17.250
So this is how back propagation works.

03:20.380 --> 03:26.920
As you can see here, this is the pictorial representation of forward propagation in this process,

03:26.980 --> 03:31.930
error, contribution of each neuron as well as error at output is calculated.

03:35.070 --> 03:40.770
And in back propagation, these calculated errors are distributed throughout the network.

03:41.490 --> 03:46.440
The process of forward propagation and backward propagation is done multiple times.

03:46.650 --> 03:50.610
And at the end, we get optimized value of cost function.

03:51.330 --> 03:55.920
So district ordeal about the gradient descent and back propagation NCEA.

03:56.460 --> 03:58.230
I will see you in the next one.

03:58.510 --> 04:00.270
buildOn Happy Learning.