1
00:00:00,810 --> 00:00:01,290
Hello.

2
00:00:01,350 --> 00:00:02,300
Welcome back.

3
00:00:02,370 --> 00:00:08,430
In machine learning training examples are often denoted by the letter M.

4
00:00:08,430 --> 00:00:16,740
We often break our training examples into two sets the training set and the test set training set is

5
00:00:16,830 --> 00:00:24,380
often denoted by m subscript train and a test set is denoted by m subscript test.

6
00:00:24,960 --> 00:00:27,930
We denote the number of features by n.

7
00:00:28,170 --> 00:00:33,690
You can think of number of features as the number of inputs in our previous examples.

8
00:00:33,690 --> 00:00:38,740
Our neural network had two features in the last example.

9
00:00:38,790 --> 00:00:44,560
The features were hours of rest and hours of work out in our body builders.

10
00:00:44,560 --> 00:00:51,630
Example I mean when we work in with image recognition the number of features becomes the number of pixels

11
00:00:51,660 --> 00:00:57,810
in the image because we will have to find each pixel into the neural network.

12
00:00:57,810 --> 00:01:02,880
So let's say we want to design a neural network for a binary classification problem.

13
00:01:02,880 --> 00:01:09,270
Let's say we want to a neural network to take a sixty four by sixty four pixel image and tell us if

14
00:01:09,270 --> 00:01:15,260
the image has a cat or not to store an image in our computer.

15
00:01:15,260 --> 00:01:22,500
The computer stores three separate matrices corresponding to the red the green and the blue channels

16
00:01:22,680 --> 00:01:28,620
of the image 64 pixels by 64 pixels each.

17
00:01:28,620 --> 00:01:34,970
This means we would have three multiplied by sixty four by sixty four because each image size is sixty

18
00:01:34,970 --> 00:01:43,470
four by sixty four pixels we multiply by three because we have three channels red green and blue to

19
00:01:43,470 --> 00:01:48,100
turn these pixel intensity values into feature vectors.

20
00:01:48,100 --> 00:01:57,300
What we would need to do is en route the m the pixel we need to row the pixel values into a vector like

21
00:01:57,300 --> 00:01:58,470
this.

22
00:01:58,470 --> 00:02:05,370
We simply need to take the red matrix which is a 2D array and convert it to affect or meaning a 1 D

23
00:02:05,430 --> 00:02:06,460
array.

24
00:02:06,480 --> 00:02:08,280
We do that by list in D.

25
00:02:08,390 --> 00:02:15,480
The pixel values vertically we take the first rule we write the first value and then we write the second

26
00:02:15,480 --> 00:02:20,400
value below the first value and then we write the third value below the second value.

27
00:02:20,910 --> 00:02:27,600
Once we are done with the first rule we continue by listing the values of the second row in the same

28
00:02:27,600 --> 00:02:28,500
vector.

29
00:02:28,590 --> 00:02:33,090
We put a second rule of values under the First Rule values.

30
00:02:33,090 --> 00:02:37,630
We continue doing this until we've added all the rows of the red channel.

31
00:02:37,740 --> 00:02:43,890
Once that is done we move on to the green channel and then we list all the pixel values of the green

32
00:02:43,890 --> 00:02:45,970
channel under the red channel.

33
00:02:45,990 --> 00:02:52,500
Once that is done we do it for the blue channel by listing everything under the green one.

34
00:02:52,500 --> 00:03:00,000
Once this is done then we will have twelve thousand two hundred and twenty eight features like we can

35
00:03:00,000 --> 00:03:01,020
see over here.

36
00:03:01,020 --> 00:03:02,870
That is why this vector.

37
00:03:02,880 --> 00:03:08,880
Yes that's from 0 and ends at twelve thousand two hundred and twenty eight which is the same as this

38
00:03:09,150 --> 00:03:13,380
which is the same as sixty four multiplied by sixty four multiplied by three.

39
00:03:13,470 --> 00:03:20,550
We've taken this and essentially converted it to effector the training examples are put together in

40
00:03:20,550 --> 00:03:28,020
a matrix denoted here as couple to X capital X then becomes our input matrix representing all training

41
00:03:28,080 --> 00:03:36,120
examples and features each column in input matrix copy to x represent a single training example the

42
00:03:36,120 --> 00:03:43,140
first row of the input matrix capital x represents the first feature of the training examples I should

43
00:03:43,140 --> 00:03:51,120
remind you that whenever I see features you should simply think of inputs input matrix X shall be processed

44
00:03:51,150 --> 00:03:59,790
by the neural network column by column remember again the columns represent each training example capital

45
00:03:59,850 --> 00:04:06,540
X belongs to real numbers and the dimension of capital X is the number of features by the number of

46
00:04:06,540 --> 00:04:14,970
examples the training labels shall be kept in a vector Capital Y where Y 1 is the answer to the question

47
00:04:15,210 --> 00:04:24,810
is x 1 cuts or not a cut and Y 2 is the answer to the question is x 2 it cuts or not to cut and so on

48
00:04:24,930 --> 00:04:32,420
and so forth why y it is the value that we predicted whereas Y is the value we expected we can see it

49
00:04:32,420 --> 00:04:41,700
a y hat is the probability of Y given that we know X and this is how you write it P Y vertical line

50
00:04:41,790 --> 00:04:51,150
X like that oh no network often has two parameters W and b we shall talk about something else known

51
00:04:51,150 --> 00:04:58,020
as hyper parameters later on in this course W is known as the way to matrix and B is known as the bias

52
00:04:58,860 --> 00:05:05,200
we compute my heart by performing at DOT PRODUCT operation between the transpose of the weight matrix

53
00:05:05,560 --> 00:05:08,550
and the input matrix and the bios.

54
00:05:08,590 --> 00:05:13,800
And then we apply activation function to the resort.

55
00:05:13,870 --> 00:05:19,990
Now let's look at a while logistic regression loss function the loss function is simply the function

56
00:05:19,990 --> 00:05:22,020
for calculating error.

57
00:05:22,060 --> 00:05:29,410
The simplified lost function we saw in our previous lessons looked at is similar to this predicted value

58
00:05:29,530 --> 00:05:31,720
minus expected value.

59
00:05:31,780 --> 00:05:39,460
We square the difference and then multiply by half a more accurate loss function for logistic regression.

60
00:05:39,520 --> 00:05:46,910
As this one over here minus brackets open why log why not.

61
00:05:47,080 --> 00:05:57,510
Plus brackets open 1 minus Y bracket close and then lock 1 brackets open 1 minus Y hot.

62
00:05:57,760 --> 00:05:59,730
You can post the video and take a look at it.

63
00:06:00,490 --> 00:06:05,860
So this a more accurate lost function for our logistic regression.

64
00:06:05,860 --> 00:06:12,580
Remember the lost function is applied to just a single training example D cost function is the cost

65
00:06:12,580 --> 00:06:14,610
of all the parameters.

66
00:06:14,680 --> 00:06:18,760
In other words the cost function is the average of all the lost functions.

67
00:06:18,790 --> 00:06:21,220
This is what the cost function looks like.

68
00:06:21,220 --> 00:06:27,100
We are adding up all the lost functions and then divide into result by the number of training examples.

69
00:06:27,130 --> 00:06:32,970
This is the cost function which we'll use to train our neural network when we go to code.

70
00:06:33,650 --> 00:06:39,300
So um this brings us to the end of this lesson and I shall see you in the next lesson.