1
00:00:00,360 --> 00:00:07,060
Hi and welcome to the section on the soft Mammalia, the soft actually is what produces the final output.

2
00:00:07,230 --> 00:00:09,810
The probabilities that come out of our CNN.

3
00:00:09,900 --> 00:00:11,400
So let's take a look at this list.

4
00:00:12,360 --> 00:00:14,480
So remember, this was a fully connected layer.

5
00:00:14,490 --> 00:00:18,420
We had our feature maps here from the Max Boulia coming out of that operation.

6
00:00:18,840 --> 00:00:22,110
Then we flattened it and then we connected to these output nodes.

7
00:00:22,590 --> 00:00:25,650
But these output nodes don't output probabilities just yet.

8
00:00:26,070 --> 00:00:31,110
There's another layer it's kind of considered earlier, but it's not technically a little mathematical

9
00:00:31,110 --> 00:00:33,090
operation on these outputs.

10
00:00:33,630 --> 00:00:35,560
And that operation is a soft max layer.

11
00:00:35,580 --> 00:00:41,250
And the reason we do it is because we need to produce probability outcomes for each class in the network.

12
00:00:41,880 --> 00:00:44,850
So to self max converts to log into probabilities.

13
00:00:45,420 --> 00:00:50,610
And when I say logs, I mean the actual outputs we experience, we take of the outputs, which you'll

14
00:00:50,610 --> 00:00:51,300
see shortly.

15
00:00:51,930 --> 00:00:58,040
So it's as I said, it takes the experience of every output and normalizes each x each output by the

16
00:00:58,050 --> 00:00:59,190
sum of the expense.

17
00:00:59,610 --> 00:01:04,260
And that guarantees that it will all sum up to one and no values actually zero.

18
00:01:04,770 --> 00:01:09,720
So this is the mathematical operational formula for this off max.

19
00:01:10,290 --> 00:01:12,270
Let's take a look and see how it works.

20
00:01:12,310 --> 00:01:16,170
So imagine that we have these scores coming out of the output layers.

21
00:01:17,130 --> 00:01:23,940
We can apply the max function here, which would be two squared divided by the sum of the squared two

22
00:01:23,940 --> 00:01:27,390
squared, which is four plus one plus 0.1 squared.

23
00:01:27,840 --> 00:01:32,220
And in the end, that'll give us the probabilities corresponding to each class.

24
00:01:32,730 --> 00:01:40,860
So imagine if this note here corresponded to the number in nodes are the final output of the CNN and

25
00:01:40,860 --> 00:01:47,610
Imagine and CNN's or even in neural networks or, you know, networks, the outputs correspond to a

26
00:01:47,610 --> 00:01:49,830
class in a classification problem.

27
00:01:50,310 --> 00:01:55,860
So if we have a classification problem where we're trying to identify ten digits like the zero one two

28
00:01:55,860 --> 00:02:00,000
three four five six seven eight nine 10, you will have 10 output nodes.

29
00:02:00,420 --> 00:02:05,840
Similarly, if you're trying to identify cats, dogs and penguins, you will have tree output nodes

30
00:02:06,420 --> 00:02:07,170
and we ticked.

31
00:02:07,170 --> 00:02:14,640
And basically the output nodes are some scores that indicate like a high trend or high prevalence of

32
00:02:14,640 --> 00:02:17,790
it being a high indication of it being that specific class.

33
00:02:18,210 --> 00:02:22,650
But we want that in a probability form because probabilities are easy to work with mathematically,

34
00:02:23,160 --> 00:02:25,620
and that's where the soft max operation gives us.

35
00:02:25,950 --> 00:02:31,650
So in the end, we convert these output values two point seven point two and point one in this case.

36
00:02:32,280 --> 00:02:33,810
So that's stop there.

37
00:02:33,810 --> 00:02:39,180
And now we'll start putting together everything to show how we build for CNN.

38
00:02:39,630 --> 00:02:41,040
So I'll see you in the next section.

39
00:02:41,220 --> 00:02:41,640
Thank you.