WEBVTT

00:02.120 --> 00:04.220
Hello and welcome to this tutorial.

00:04.550 --> 00:11.120
Here we loan activation function to understand the activation function.

00:11.300 --> 00:14.680
First, we have to understand elements of the neural network.

00:15.500 --> 00:18.140
There are three main elements of the neural net.

00:18.170 --> 00:22.790
But input layer hillen layers and output layer.

00:23.030 --> 00:25.340
Let us understand these three main elements.

00:25.550 --> 00:26.990
One by one in detail.

00:28.460 --> 00:29.390
Input layer.

00:29.660 --> 00:35.630
It takes input features and provides information from outside world to neural network.

00:35.780 --> 00:40.790
No computation is performed at this layer hidden layer.

00:41.120 --> 00:44.990
The notes of this layer are not exposed to outer world.

00:45.390 --> 00:50.940
He then layer performs all sort of computation on the features entered through the input layer.

00:51.530 --> 00:54.380
So most of the processing is done in Hillen Layer.

00:55.790 --> 01:02.360
And at the end we have an output layer output layer blinks of the information loaned by the neural network

01:02.360 --> 01:03.530
to outer world.

01:04.340 --> 01:07.010
So this is all about the elements of the neural net.

01:07.040 --> 01:12.290
But that is input layer, hidden layer and output layer.

01:15.450 --> 01:18.970
This is the pictorial representation of elements of neural net.

01:19.080 --> 01:24.300
But that is input layer, hidden layers and output layer.

01:27.410 --> 01:35.570
After understanding the elements of neural network, we can understand activation function, the activation

01:35.570 --> 01:38.660
function calculates activated some of its input.

01:38.900 --> 01:41.180
And after that, it adds a bias.

01:41.510 --> 01:45.920
Then it decides whether a neuron should be activated or not.

01:46.670 --> 01:53.270
In simple words, we can say that activation function besides a signal should pass through or not.

01:54.830 --> 02:01.400
The purpose of activation function is to introduce non-linearity into output of a neuron.

02:04.470 --> 02:05.320
Let us understand.

02:05.340 --> 02:08.640
Why do we need a non-linear activation function?

02:10.170 --> 02:18.810
A neural network without activation function is just a linear regression model with this nonlinear transformation.

02:18.990 --> 02:23.820
The neural network is capable of learning and performing more complex tasks.

02:24.480 --> 02:29.700
So these are the two main reasons that we need a nonlinear activation function.

02:32.870 --> 02:39.020
These are the firemen activation functions that we use, linear function, sigmoid function.

02:39.230 --> 02:43.640
Then at Function Relu and Softmax function.

02:44.180 --> 02:45.620
Let us understand this for you.

02:45.620 --> 02:48.350
Activation functions one by one in the day.

02:51.530 --> 02:52.790
Linear function.

02:54.320 --> 02:56.540
This is the equation of linear function.

02:56.720 --> 03:01.190
Y is equal to X, Y is dependent variable here.

03:01.370 --> 03:05.690
X is independent variable and A is a constant.

03:07.370 --> 03:14.240
No matter how many layers that be, how in a neural network, Eve, all the layers are linear in nature.

03:14.540 --> 03:21.460
Then the final activation function of last layer is nothing but just a linear function of input layer.

03:22.280 --> 03:30.440
In simple words, we can say that linear function used linear output of input layer a range of linear

03:30.440 --> 03:33.980
function from minus infinity to plus infinity.

03:35.330 --> 03:38.360
We can use linear function at just one place.

03:38.510 --> 03:40.530
That is the output layer.

03:42.200 --> 03:48.740
If we differentiate a linear function to bring non-linearity, then the result will no more dependent

03:48.830 --> 03:52.910
on input X and that function will become a constant.

03:53.600 --> 03:57.980
It won't introduce any ground breaking behavior to our algorithm.

03:58.670 --> 04:01.070
So this is all about delineator function.

04:04.170 --> 04:11.050
The sigmoid activation function, as you can see here, this is the graphical representation of sigmoid

04:11.050 --> 04:11.830
function.

04:13.420 --> 04:22.570
This is the equation of sigmoid function is equal to one divided by one plus Şeref two minus X nature

04:22.570 --> 04:25.240
of this activation of function is nonlinear.

04:26.380 --> 04:29.410
The range of this function is from zero to one.

04:30.640 --> 04:36.250
We can use a sigmoid function in the output layer of a binary classification.

04:37.030 --> 04:40.420
So this is all about the sigmoid activation function.

04:43.530 --> 04:48.020
The third activation function that we use is done at function.

04:49.500 --> 04:52.470
Most of the time, Stanage function works better.

04:52.650 --> 04:54.240
Then they take more function.

04:54.450 --> 04:57.900
It is also known as tangent hyperbolic function.

04:59.220 --> 05:02.400
Mathematically, this function is a shiftier virgin optic.

05:02.400 --> 05:03.450
More function.

05:03.840 --> 05:06.300
Both functions are dissimilar functions.

05:06.480 --> 05:09.720
And we can derive these two functions from each other.

05:11.010 --> 05:14.700
This is the equation of Danek function and sigmoid function.

05:15.180 --> 05:17.970
We can derive these two functions from each other.

05:18.180 --> 05:24.990
As you can see here, the range of this function is from minus one to plus one, and nature of this

05:24.990 --> 05:26.550
function is non-linear.

05:28.060 --> 05:31.170
Jindalee we use tentage function in hidden layers.

05:31.350 --> 05:35.850
And this is because it has values in between minus one to plus one.

05:36.360 --> 05:43.350
When we applied damage function then mean for hidden layer const zero or very close to zero.

05:43.680 --> 05:48.630
Haynes, this function is very helpful to center the data close to zero.

05:49.260 --> 05:52.200
So this is all about the Tenet activation function.

05:55.400 --> 06:01.440
Relu activation function Relu stands for electrified linear unit.

06:01.690 --> 06:05.350
And this function is widely used activation function.

06:06.670 --> 06:09.910
And this is the equation for RELU activation function.

06:11.170 --> 06:17.890
Generally, the output of this function is from zero to X and range of this function in zero to infinity.

06:19.240 --> 06:23.260
Nature of this function is non-linear with detailed function.

06:23.290 --> 06:30.730
We can easily back propagate errors and multiple layers of neuron being activated by the RELU function.

06:32.050 --> 06:37.690
This function is computationally very less expensive than the 10 edge and sigmoid function.

06:37.900 --> 06:42.390
And this is because at the time only a few neurons are activated here.

06:43.090 --> 06:46.690
So theoretically, this is all about the RELU activation function.

06:49.780 --> 06:57.700
This off next function, this off max function is also a type of sigmoid function, but it is handy

06:57.710 --> 07:01.330
when when we are dealing with declassification problems.

07:02.620 --> 07:04.900
Nature of this function is non-linear.

07:06.280 --> 07:10.750
We use this activation function in the case of multiple blastin out of you.

07:12.220 --> 07:19.090
This of next function is ideally used in output, layer of declassify where we're trying to attain the

07:19.090 --> 07:22.240
probabilities to define the class of each input.

07:22.990 --> 07:26.800
So theoretically, this is all about the self next activation function.

07:29.880 --> 07:37.200
So these are the firemen activation functions that we use, linear function, sigmoid function damage,

07:37.200 --> 07:41.040
function Relu and softmax next function.

07:44.170 --> 07:47.740
Let us understand how to select the right activation function.

07:48.910 --> 07:53.550
If you really don't know which activation function to use, then use the RELU.

07:53.980 --> 07:57.310
And this is because it is a general activation function.

07:57.370 --> 07:59.830
And it is used in most cases.

08:01.120 --> 08:07.630
And if your output is for binary classification, then use the sigmoid function for the output layer.

08:08.290 --> 08:13.030
So these are the two small suggestions to select the right activation function.

08:13.870 --> 08:17.750
So this tutorial about the activation function and.

08:18.400 --> 08:20.710
I will see you in the next one then.

08:20.730 --> 08:22.270
buildOn, happy learning.