1
00:00:00,170 --> 00:00:02,415
In the previous lesson,

2
00:00:02,415 --> 00:00:04,295
you learned what
the machine learning paradigm

3
00:00:04,295 --> 00:00:05,940
is and how you use

4
00:00:05,940 --> 00:00:07,920
data and labels and
have a computer

5
00:00:07,920 --> 00:00:10,305
in fair the rules
between them for you.

6
00:00:10,305 --> 00:00:13,140
You looked at a very simple
example where it figured

7
00:00:13,140 --> 00:00:16,185
out the relationship between
two sets of numbers.

8
00:00:16,185 --> 00:00:18,360
Let's now take this
to the next level by

9
00:00:18,360 --> 00:00:20,970
solving a real problem,
computer vision.

10
00:00:20,970 --> 00:00:23,925
Computer vision is the field
of having a computer

11
00:00:23,925 --> 00:00:27,345
understand and label what
is present in an image.

12
00:00:27,345 --> 00:00:28,890
Consider this slide.

13
00:00:28,890 --> 00:00:30,840
When you look at it,
you can interpret

14
00:00:30,840 --> 00:00:32,985
what a shirt is or
what a shoe is,

15
00:00:32,985 --> 00:00:35,115
but how would you
program for that?

16
00:00:35,115 --> 00:00:37,410
If an extra terrestrial
who had never seen

17
00:00:37,410 --> 00:00:39,560
clothing walked into
the room with you,

18
00:00:39,560 --> 00:00:41,465
how would you explain
the shoes to him?

19
00:00:41,465 --> 00:00:45,245
It's really difficult, if
not impossible to do right?

20
00:00:45,245 --> 00:00:47,765
And it's the same problem
with computer vision.

21
00:00:47,765 --> 00:00:49,790
So one way to solve that is to

22
00:00:49,790 --> 00:00:51,830
use lots of pictures
of clothing and

23
00:00:51,830 --> 00:00:54,770
tell the computer what that's
a picture of and then have

24
00:00:54,770 --> 00:00:56,390
the computer figure
out the patterns

25
00:00:56,390 --> 00:00:58,220
that give you the difference
between a shoe,

26
00:00:58,220 --> 00:01:00,475
and a shirt, and a
handbag, and a coat.

27
00:01:00,475 --> 00:01:02,120
That's what you're going to learn

28
00:01:02,120 --> 00:01:03,905
how to do in this section.

29
00:01:03,905 --> 00:01:05,930
Fortunately, there's a data set

30
00:01:05,930 --> 00:01:08,090
called Fashion MNIST which gives

31
00:01:08,090 --> 00:01:09,890
a 70 thousand images

32
00:01:09,890 --> 00:01:12,470
spread across 10 different
items of clothing.

33
00:01:12,470 --> 00:01:15,980
These images have been scaled
down to 28 by 28 pixels.

34
00:01:15,980 --> 00:01:17,840
Now usually, the
smaller the better

35
00:01:17,840 --> 00:01:19,880
because the computer has
less processing to do.

36
00:01:19,880 --> 00:01:21,620
But of course, you need to retain

37
00:01:21,620 --> 00:01:23,540
enough information
to be sure that

38
00:01:23,540 --> 00:01:26,005
the features and the object
can still be distinguished.

39
00:01:26,005 --> 00:01:28,340
If you look at this slide
you can still tell

40
00:01:28,340 --> 00:01:30,620
the difference between
shirts, shoes, and handbags.

41
00:01:30,620 --> 00:01:32,825
So this size does
seem to be ideal,

42
00:01:32,825 --> 00:01:35,945
and it makes it great for
training a neural network.

43
00:01:35,945 --> 00:01:38,625
The images are also
in gray scale,

44
00:01:38,625 --> 00:01:41,585
so the amount of information
is also reduced.

45
00:01:41,585 --> 00:01:44,360
Each pixel can be represented
in values from zero to

46
00:01:44,360 --> 00:01:47,620
255 and so it's
only one byte per pixel.

47
00:01:47,620 --> 00:01:50,260
With 28 by 28 pixels in an image,

48
00:01:50,260 --> 00:01:54,455
only 784 bytes are needed
to store the entire image.

49
00:01:54,455 --> 00:01:56,360
Despite that, we can still see

50
00:01:56,360 --> 00:01:57,890
what's in the image
and in this case,

51
00:01:57,890 --> 00:01:59,970
it's an ankle boot, right?