1
00:00:00,000 --> 00:00:01,799
In the previous lessons,

2
00:00:01,799 --> 00:00:04,470
you looked at building a binary
classifier that predicted

3
00:00:04,470 --> 00:00:07,425
cats versus dogs or
horses versus humans.

4
00:00:07,425 --> 00:00:09,930
You also saw how
overfitting can occur

5
00:00:09,930 --> 00:00:12,660
and explored some practices
for avoiding it.

6
00:00:12,660 --> 00:00:14,280
The problem with these of course,

7
00:00:14,280 --> 00:00:16,860
is that the training data
was very small,

8
00:00:16,860 --> 00:00:18,930
and there's only so
many common features

9
00:00:18,930 --> 00:00:20,190
that can be extracted,

10
00:00:20,190 --> 00:00:23,250
even if we do some tricks
like image augmentation.

11
00:00:23,250 --> 00:00:24,800
But in both these cases,

12
00:00:24,800 --> 00:00:26,685
you built the model from scratch.

13
00:00:26,685 --> 00:00:28,935
What if you could take
an existing model

14
00:00:28,935 --> 00:00:31,260
that's trained on far more data,

15
00:00:31,260 --> 00:00:34,020
and use the features
that that model learned?

16
00:00:34,020 --> 00:00:36,070
That's the concept of
transfer learning,

17
00:00:36,070 --> 00:00:38,270
and we'll explore
that in this lesson.

18
00:00:38,270 --> 00:00:39,860
So for example, if you

19
00:00:39,860 --> 00:00:42,170
visualize your model
like this with

20
00:00:42,170 --> 00:00:43,970
a series of convolutional layers

21
00:00:43,970 --> 00:00:46,610
before dense layer leads
your output layer,

22
00:00:46,610 --> 00:00:48,845
you feed your data
into the top layer,

23
00:00:48,845 --> 00:00:50,720
the network learns
the convolutions that

24
00:00:50,720 --> 00:00:53,645
identify the features in
your data and all that.

25
00:00:53,645 --> 00:00:56,060
But consider
somebody else's model,

26
00:00:56,060 --> 00:00:59,269
perhaps one that's far more
sophisticated than yours,

27
00:00:59,269 --> 00:01:01,460
trained on a lot more data.

28
00:01:01,460 --> 00:01:04,250
They have convolutional layers
and they're here

29
00:01:04,250 --> 00:01:07,505
intact with features that
have already been learned.

30
00:01:07,505 --> 00:01:09,320
So you can lock them

31
00:01:09,320 --> 00:01:11,600
instead of retraining
them on your data,

32
00:01:11,600 --> 00:01:14,090
and have those just
extract the features from

33
00:01:14,090 --> 00:01:16,430
your data using the convolutions

34
00:01:16,430 --> 00:01:18,200
that they've already learned.

35
00:01:18,200 --> 00:01:20,780
Then you can take a model
that has been trained on

36
00:01:20,780 --> 00:01:23,210
a very large datasets and use

37
00:01:23,210 --> 00:01:24,710
the convolutions that it

38
00:01:24,710 --> 00:01:27,155
learned when
classifying its data.

39
00:01:27,155 --> 00:01:29,930
If you recall how
convolutions are created and

40
00:01:29,930 --> 00:01:32,929
used to identify
particular features,

41
00:01:32,929 --> 00:01:35,510
and the journey of a feature
through the network,

42
00:01:35,510 --> 00:01:38,345
it makes sense to just use those,

43
00:01:38,345 --> 00:01:40,505
and then retrain the dense layers

44
00:01:40,505 --> 00:01:42,715
from that model with your data.

45
00:01:42,715 --> 00:01:45,170
Of course, well, it's
typical that you might

46
00:01:45,170 --> 00:01:47,720
lock all the convolutions.
You don't have to.

47
00:01:47,720 --> 00:01:50,390
You can choose to retrain
some of the lower ones

48
00:01:50,390 --> 00:01:52,160
too because they may be too

49
00:01:52,160 --> 00:01:54,485
specialized for
the images at hand.

50
00:01:54,485 --> 00:01:56,210
It takes some trial and error

51
00:01:56,210 --> 00:01:58,085
to discover
the right combination.

52
00:01:58,085 --> 00:02:01,550
So let's take a well-trained
state of the art model.

53
00:02:01,550 --> 00:02:02,960
There's one called Inception,

54
00:02:02,960 --> 00:02:05,405
which you can learn
more about at his site.

55
00:02:05,405 --> 00:02:09,305
This has been pre-trained
on a dataset from ImageNet,

56
00:02:09,305 --> 00:02:14,490
which has 1.4 million images
in a 1000 different classes.