1
00:00:00,630 --> 00:00:01,740
Hi and welcome back.

2
00:00:02,160 --> 00:00:07,800
We're about to start the lesson on activation layers, and we'll talk a bit about religion, which is

3
00:00:07,800 --> 00:00:12,190
the most important useful activation layer for science.

4
00:00:12,810 --> 00:00:16,350
So let's take a look and see why activation is important.

5
00:00:16,890 --> 00:00:21,120
So it's important you think about the purpose of an activation function.

6
00:00:21,720 --> 00:00:27,000
This could confuse a lot of you, so be slow and try to introduce this topic gently.

7
00:00:27,750 --> 00:00:31,560
Firstly, remember, we're trying to learn complex patterns in our data.

8
00:00:32,010 --> 00:00:39,360
These complex patterns mean that the data is so vast and has so much different variations that small

9
00:00:39,360 --> 00:00:45,330
changes like imagine you're watching the imagining trying to identify digits and the difference between

10
00:00:45,330 --> 00:00:49,890
a one and a seven isn't that much is just this one has a bit.

11
00:00:50,370 --> 00:00:56,640
It's a bit more straight and vertical as a seven can have a slight angle and it has a top horizontal

12
00:00:56,640 --> 00:00:57,720
bar went up to seven.

13
00:00:58,410 --> 00:01:00,480
But those are minor differences.

14
00:01:00,480 --> 00:01:07,350
If you think about it and we do different the way our mind actually knows that it's a seven and a one

15
00:01:07,920 --> 00:01:13,830
is that there's a decision boundary in our brains, and our brains basically are just a bunch of neurons.

16
00:01:14,130 --> 00:01:20,430
That's effectively what it is that fire and activate and be able to identify what's what quite efficiently

17
00:01:20,430 --> 00:01:21,210
and quite quickly.

18
00:01:21,210 --> 00:01:23,940
It's actually amazing how efficient our brains all this.

19
00:01:24,780 --> 00:01:28,620
So this gets to the point where we talk about non linearity.

20
00:01:29,190 --> 00:01:32,580
So remember, I talked about that decision boundary between a one and a seven.

21
00:01:33,120 --> 00:01:35,280
Now imagine we have a whole host of digits.

22
00:01:35,280 --> 00:01:39,360
You have all the numbers and alphabet all the digits between zero and 10.

23
00:01:39,780 --> 00:01:44,310
That's a lot of different classes, so you need to have a complicated way.

24
00:01:44,340 --> 00:01:51,570
Basically, it's a nonlinear type function that can take these inputs and produced output that we want.

25
00:01:51,690 --> 00:01:53,550
Well, that's the upper that's correct, I should say.

26
00:01:54,180 --> 00:01:56,280
So that's for the non linearity.

27
00:01:56,280 --> 00:02:02,250
It's it's a it's a basically decision boundaries that vary and depend on many different factors.

28
00:02:02,250 --> 00:02:04,590
It's not a linear type mapping function.

29
00:02:05,250 --> 00:02:11,220
So that's hopefully that gives you some intuition on why activation functions are important and what

30
00:02:11,220 --> 00:02:13,560
they allow CNN's to do.

31
00:02:14,400 --> 00:02:17,220
So let's take a look at some simple activation functions.

32
00:02:17,880 --> 00:02:23,970
So we're going to we're going to talk a bit about really in depth and we lose advantages in CNN training

33
00:02:23,970 --> 00:02:28,980
because it's actually quite simple computation and it's faster train, which means in fact, it's fast

34
00:02:28,980 --> 00:02:31,380
to train and it doesn't saturate.

35
00:02:31,710 --> 00:02:32,970
So that's also a good point.

36
00:02:33,570 --> 00:02:38,640
So let's take a look at some simple activation functions, and you're seeing this function here, this

37
00:02:38,640 --> 00:02:42,900
function here, this function here, which is really to rectify the linear unit.

38
00:02:42,940 --> 00:02:43,800
It's what it stands for.

39
00:02:44,280 --> 00:02:45,810
So what does this mean?

40
00:02:46,440 --> 00:02:48,300
Let's let's take a look at the redo in this case.

41
00:02:48,630 --> 00:02:54,810
This means that all of the values, all of the inputs into function are less than zero will always be

42
00:02:54,810 --> 00:02:55,200
zero.

43
00:02:55,380 --> 00:02:57,820
So that's what a this mathematical function is seeing here.

44
00:02:58,230 --> 00:03:00,060
If Z is less than zero at zero.

45
00:03:00,390 --> 00:03:03,660
However, if Z is greater than zero, then it takes the value of set.

46
00:03:04,620 --> 00:03:08,340
So that's the nonlinear linearity part of this function right here.

47
00:03:09,870 --> 00:03:12,510
So as you can see, this is another way to represent reload.

48
00:03:12,990 --> 00:03:17,730
Basically, it's a max of everything that's over zero and the max, meaning that it just gets the value

49
00:03:17,730 --> 00:03:21,870
of X here and everything below zero is zero, basically.

50
00:03:22,470 --> 00:03:27,390
So we leave all positive values alone and we change all negative values to zero.

51
00:03:27,600 --> 00:03:29,640
That's really in a nutshell.

52
00:03:31,020 --> 00:03:33,960
So let's take a look and see how we apply a real mathematically.

53
00:03:34,260 --> 00:03:36,510
Let's see what it does to our ultimate future map.

54
00:03:37,110 --> 00:03:43,440
So this is the output feature move we get by involving this input image with this filter or kernel,

55
00:03:44,040 --> 00:03:51,390
and you can see we have two one minus one minus one, one three to one and minus five.

56
00:03:51,980 --> 00:03:54,210
Remember what really we said?

57
00:03:54,510 --> 00:03:58,680
We change all the negative values to zero and leave all the positive values alone.

58
00:03:59,160 --> 00:04:00,450
Well, that's simply what we do.

59
00:04:00,480 --> 00:04:03,090
All the negative values here becomes zero.

60
00:04:03,150 --> 00:04:06,680
You can see them here, and all the positive values remain the same.

61
00:04:07,140 --> 00:04:08,310
That's it's quite simple.

62
00:04:08,310 --> 00:04:09,150
That's what really does.

63
00:04:09,540 --> 00:04:15,750
So real is often considered to be in the same layer as a conversion layer.

64
00:04:15,780 --> 00:04:19,020
We just specify what activation function we're using.

65
00:04:19,500 --> 00:04:25,010
However, a lot of the literature you will read, some authors or researchers like to call a renewal

66
00:04:25,020 --> 00:04:28,740
a separate layer when in fact some people are lumped together.

67
00:04:29,590 --> 00:04:31,170
I'm of the opinion that can be bought.

68
00:04:31,530 --> 00:04:36,840
However, you want to infer, just make sure you're consistent when you're seeing it, as this neural

69
00:04:36,840 --> 00:04:41,310
net has nine layers and you want to compare it to another one, just make sure you compare them correctly.

70
00:04:42,990 --> 00:04:49,080
So this is an example of how the re-look operation works after we take this feature up and we apply

71
00:04:49,470 --> 00:04:49,660
it.

72
00:04:50,860 --> 00:04:51,180
Sorry.

73
00:04:51,780 --> 00:04:53,160
So you can see the output here.

74
00:04:53,490 --> 00:04:59,670
You can see what's happening is that all of the dog values, which are the negative values are being.

75
00:05:00,300 --> 00:05:03,780
Basically eliminated in a way, and all the white values are roughly staying.

76
00:05:04,680 --> 00:05:07,590
So that's it for the real, real lesson.

77
00:05:08,040 --> 00:05:11,460
Next, we'll take a look at the pooling layer of CNN's.

78
00:05:11,970 --> 00:05:13,440
So stay tuned for that lesson.

79
00:05:13,560 --> 00:05:14,040
Thank you.