1
00:00:00,000 --> 00:00:01,980
To this point,
we've been creating

2
00:00:01,980 --> 00:00:04,230
convolutional neural
networks that train to

3
00:00:04,230 --> 00:00:06,615
recognize images
in binary classes.

4
00:00:06,615 --> 00:00:09,075
Horses or humans, cats or dogs.

5
00:00:09,075 --> 00:00:11,520
They've worked quite
well despite having

6
00:00:11,520 --> 00:00:14,550
relatively small amounts
of data to train on.

7
00:00:14,550 --> 00:00:17,115
But we're at a risk of
falling into a trap of

8
00:00:17,115 --> 00:00:20,170
overconfidence caused
by overfitting.

9
00:00:20,170 --> 00:00:22,630
Namely, when
the dataset is small,

10
00:00:22,630 --> 00:00:25,725
we have relatively few examples
and as a result,

11
00:00:25,725 --> 00:00:28,245
we can have some mistakes
in our classification.

12
00:00:28,245 --> 00:00:30,510
You've probably heard us
use the term overfitting

13
00:00:30,510 --> 00:00:33,510
a lot and it's important to
understand what that is.

14
00:00:33,510 --> 00:00:35,700
Think of it as being very good at

15
00:00:35,700 --> 00:00:38,490
spotting something from
a limited dataset,

16
00:00:38,490 --> 00:00:40,340
but getting confused when you see

17
00:00:40,340 --> 00:00:43,160
something that doesn't
match your expectations.

18
00:00:43,160 --> 00:00:45,320
So for example,
imagine that these are

19
00:00:45,320 --> 00:00:48,275
the only shoes you've
ever seen in your life.

20
00:00:48,275 --> 00:00:50,435
Then, you learn that
these are shoes

21
00:00:50,435 --> 00:00:52,790
and this is what shoes look like.

22
00:00:52,790 --> 00:00:55,175
So if I were to show you these,

23
00:00:55,175 --> 00:00:57,620
you would recognize them
as shoes even if they are

24
00:00:57,620 --> 00:01:00,425
different sizes than
what you would expect.

25
00:01:00,425 --> 00:01:02,824
But if I were to show you this,

26
00:01:02,824 --> 00:01:04,520
even though it's
a shoe, you would

27
00:01:04,520 --> 00:01:06,875
likely not recognize it as such.

28
00:01:06,875 --> 00:01:08,775
In that scenario, you have

29
00:01:08,775 --> 00:01:11,870
overfit in your understanding
of what a shoe looks like.

30
00:01:11,870 --> 00:01:13,700
You weren't flexible
enough to see

31
00:01:13,700 --> 00:01:16,160
this high-heel as
a shoe because all of

32
00:01:16,160 --> 00:01:18,050
your training and all of
your experience in what

33
00:01:18,050 --> 00:01:20,920
shoes look like are
these hiking boots.

34
00:01:20,920 --> 00:01:24,200
Now, this is a common problem
in training classifiers,

35
00:01:24,200 --> 00:01:26,440
particularly when you
have limited data.

36
00:01:26,440 --> 00:01:28,120
If you think about
it, you would need

37
00:01:28,120 --> 00:01:30,954
an infinite dataset to
build a perfect classifier,

38
00:01:30,954 --> 00:01:33,275
but that might take a
little too long to train.

39
00:01:33,275 --> 00:01:36,340
So in this lesson, I want to
look at some tools that are

40
00:01:36,340 --> 00:01:37,570
available to you to make

41
00:01:37,570 --> 00:01:40,045
your smaller datasets
more effective.

42
00:01:40,045 --> 00:01:43,825
We'll start with
a simple concept, augmentation.

43
00:01:43,825 --> 00:01:46,360
When using convolutional
neural networks,

44
00:01:46,360 --> 00:01:48,340
we've been passing
convolutions over

45
00:01:48,340 --> 00:01:51,385
an image in order to learn
particular features.

46
00:01:51,385 --> 00:01:53,470
Maybe it's the
pointy ears for cat,

47
00:01:53,470 --> 00:01:56,575
two legs instead of four for
human, that kind of thing.

48
00:01:56,575 --> 00:01:59,350
Convolutions have been
very good at spotting

49
00:01:59,350 --> 00:02:02,050
these if they're clear and
distinct in the image.

50
00:02:02,050 --> 00:02:03,930
But if we could go further,

51
00:02:03,930 --> 00:02:05,450
what if for example we could

52
00:02:05,450 --> 00:02:07,970
transform the image of the cat so

53
00:02:07,970 --> 00:02:09,740
that it could match
other pictures of

54
00:02:09,740 --> 00:02:12,275
cats where the ears are
oriented differently?

55
00:02:12,275 --> 00:02:14,555
So if the network
was never trained

56
00:02:14,555 --> 00:02:16,985
for an image of
a cat reclining like this,

57
00:02:16,985 --> 00:02:18,605
it may not recognize it.

58
00:02:18,605 --> 00:02:21,320
If you don't have the data
for a cat reclining,

59
00:02:21,320 --> 00:02:24,395
then you could end up in
an overfitting situation.

60
00:02:24,395 --> 00:02:26,825
But if your images are fed
into the training with

61
00:02:26,825 --> 00:02:29,690
augmentation such as a rotation,

62
00:02:29,690 --> 00:02:32,140
the feature might
then be spotted,

63
00:02:32,140 --> 00:02:34,370
even if you don't
have a cat reclining,

64
00:02:34,370 --> 00:02:36,590
your upright cat when rotated,

65
00:02:36,590 --> 00:02:38,910
could end up looking the same.