1
00:00:01,416 --> 00:00:05,440
So an image is closed once we build it,

2
00:00:05,440 --> 00:00:07,670
once these instructions were executed.

3
00:00:07,670 --> 00:00:09,610
That's why we have to rebuild it

4
00:00:09,610 --> 00:00:12,040
if we need to update something in there,

5
00:00:12,040 --> 00:00:13,990
for example, when our code changed,

6
00:00:13,990 --> 00:00:17,920
and we wanna copy the new code into a new image,

7
00:00:17,920 --> 00:00:20,513
that's what we covered in the previous lecture.

8
00:00:21,370 --> 00:00:22,850
Building up on that,

9
00:00:22,850 --> 00:00:26,780
there is another important concept related to images

10
00:00:26,780 --> 00:00:29,570
which you also should be aware of.

11
00:00:29,570 --> 00:00:31,780
They are layer based.

12
00:00:31,780 --> 00:00:34,050
Now what do I mean by that?

13
00:00:34,050 --> 00:00:37,130
With that, I mean that when you build an image,

14
00:00:37,130 --> 00:00:38,869
or when you rebuild it,

15
00:00:38,869 --> 00:00:42,860
only the instructions where something changed,

16
00:00:42,860 --> 00:00:47,440
and all the instructions there after are re-evaluated.

17
00:00:47,440 --> 00:00:49,653
Keep in mind that I changed the code,

18
00:00:50,630 --> 00:00:52,720
and then I rebuilt this image.

19
00:00:52,720 --> 00:00:54,510
We did this in the last lecture.

20
00:00:54,510 --> 00:00:58,210
Now, I did not change the code again since then.

21
00:00:58,210 --> 00:01:01,220
So if I now rebuild this image, again,

22
00:01:01,220 --> 00:01:03,170
by running Docker build dot,

23
00:01:03,170 --> 00:01:05,040
you see this is super fast.

24
00:01:05,040 --> 00:01:08,520
It finished in like a quarter of a second.

25
00:01:08,520 --> 00:01:10,240
It was super fast

26
00:01:10,240 --> 00:01:15,240
because we see all these using cache messages here.

27
00:01:15,510 --> 00:01:17,690
Because Docker basically recognized

28
00:01:17,690 --> 00:01:19,740
that for all these instructions,

29
00:01:19,740 --> 00:01:23,470
the result when the instructions are executed again,

30
00:01:23,470 --> 00:01:25,650
will be the same as before.

31
00:01:25,650 --> 00:01:27,510
We have the same working directory,

32
00:01:27,510 --> 00:01:29,950
the code I copy has not changed at all,

33
00:01:29,950 --> 00:01:31,440
there is no new file,

34
00:01:31,440 --> 00:01:33,170
no file has changed,

35
00:01:33,170 --> 00:01:36,250
and therefore Docker is able to infer

36
00:01:36,250 --> 00:01:39,150
that it doesn't really need to go

37
00:01:39,150 --> 00:01:40,910
through that instruction again.

38
00:01:40,910 --> 00:01:43,670
Instead, whenever you build an image,

39
00:01:43,670 --> 00:01:48,150
Docker caches every instruction result,

40
00:01:48,150 --> 00:01:50,600
and when you then rebuild an image,

41
00:01:50,600 --> 00:01:53,320
it will use these cached results

42
00:01:53,320 --> 00:01:56,690
if there is no need to run an instruction again.

43
00:01:56,690 --> 00:02:00,330
And this is called a layer based architecture.

44
00:02:00,330 --> 00:02:05,043
Every instruction represents a layer in your Dockerfile.

45
00:02:06,010 --> 00:02:09,919
And an image is simply built up from multiple layers

46
00:02:09,919 --> 00:02:12,480
based on these different instructions.

47
00:02:12,480 --> 00:02:14,830
In addition, an image is read only,

48
00:02:14,830 --> 00:02:17,500
which means once an instruction has been executed

49
00:02:17,500 --> 00:02:19,700
and once the image is built,

50
00:02:19,700 --> 00:02:21,430
the image is locked in

51
00:02:21,430 --> 00:02:24,290
and code in there can't change

52
00:02:24,290 --> 00:02:26,210
unless you rebuild the image,

53
00:02:26,210 --> 00:02:28,480
which technically means you create a new image.

54
00:02:28,480 --> 00:02:30,230
That's what I covered before.

55
00:02:30,230 --> 00:02:32,080
But let's come back to these layers.

56
00:02:32,080 --> 00:02:35,340
And images layer based every instruction creates a layer

57
00:02:35,340 --> 00:02:37,463
and these layers are cached.

58
00:02:38,300 --> 00:02:41,420
If you then run a container based on an image,

59
00:02:41,420 --> 00:02:45,150
that container basically adds a new extra layer

60
00:02:45,150 --> 00:02:46,680
on top of the image,

61
00:02:46,680 --> 00:02:50,840
which is that running application that running code,

62
00:02:50,840 --> 00:02:55,050
basically the result of executing the command

63
00:02:55,050 --> 00:02:57,180
which you specified in your Dockerfile.

64
00:02:57,180 --> 00:03:01,120
This adds the final layer which only becomes active

65
00:03:01,120 --> 00:03:03,403
once you run an image as a layer.

66
00:03:04,260 --> 00:03:08,650
All the instructions before that final instruction

67
00:03:08,650 --> 00:03:12,793
are already part of the image though as separate layers.

68
00:03:13,750 --> 00:03:15,510
And when nothing changes,

69
00:03:15,510 --> 00:03:18,390
all these layers can be used from cache.

70
00:03:18,390 --> 00:03:20,580
Now if I do change something in code,

71
00:03:20,580 --> 00:03:23,720
if I add more exclamation marks here or anything else,

72
00:03:23,720 --> 00:03:25,280
no matter what you change,

73
00:03:25,280 --> 00:03:27,460
if I now build this again,

74
00:03:27,460 --> 00:03:30,960
by repeating Docker build dot,

75
00:03:30,960 --> 00:03:33,440
you will see that now it takes longer

76
00:03:33,440 --> 00:03:37,640
because it only uses some results from cache.

77
00:03:37,640 --> 00:03:38,820
It used the work

78
00:03:38,820 --> 00:03:41,650
directory instruction result from the cache,

79
00:03:41,650 --> 00:03:44,810
but it noticed that for the copy instruction,

80
00:03:44,810 --> 00:03:46,400
it needs to run it again.

81
00:03:46,400 --> 00:03:49,340
Because it scans the files which it should copy in,

82
00:03:49,340 --> 00:03:52,270
and Docker detects that one file changed,

83
00:03:52,270 --> 00:03:55,870
and hence it copies in all files again.

84
00:03:55,870 --> 00:03:57,670
Now, here's the thing.

85
00:03:57,670 --> 00:03:59,730
Whenever one layer changes,

86
00:03:59,730 --> 00:04:03,560
I said that all other layers are all rebuilt,

87
00:04:03,560 --> 00:04:07,160
Docker is not able to tell whether npm install

88
00:04:07,160 --> 00:04:10,384
would now yield the same result as before.

89
00:04:10,384 --> 00:04:13,530
After all we copied in our files again,

90
00:04:13,530 --> 00:04:16,910
and Docker does not do a deep analysis

91
00:04:16,910 --> 00:04:19,070
of which file changed where

92
00:04:19,070 --> 00:04:22,640
and if this could affect npm install.

93
00:04:22,640 --> 00:04:24,370
So whenever one layer changed,

94
00:04:24,370 --> 00:04:27,870
all subsequent layers are also re-executed,

95
00:04:27,870 --> 00:04:31,113
which is why here npm install run again.

96
00:04:32,270 --> 00:04:35,560
So I hope this layer based architecture makes sense

97
00:04:35,560 --> 00:04:36,660
and is clear.

98
00:04:36,660 --> 00:04:40,230
It exists to speed up the creation of images

99
00:04:40,230 --> 00:04:43,510
since Docker only rebuilds and re-executes

100
00:04:43,510 --> 00:04:46,180
what needs to be re-executed.

101
00:04:46,180 --> 00:04:49,140
And that's of course, a very useful mechanism.

102
00:04:49,140 --> 00:04:51,550
Now it also means that at the moment,

103
00:04:51,550 --> 00:04:54,420
whenever we change anything in our code,

104
00:04:54,420 --> 00:04:56,900
we also run npm install again,

105
00:04:56,900 --> 00:04:58,920
even though we as a developer note

106
00:04:58,920 --> 00:05:00,703
that this is unnecessary.

107
00:05:01,640 --> 00:05:04,630
Unless we change something in package.json,

108
00:05:04,630 --> 00:05:07,730
which manages the dependencies of our project,

109
00:05:07,730 --> 00:05:11,450
there is no need to run npm install again, ever.

110
00:05:11,450 --> 00:05:14,600
Because if we just changed something in our source code,

111
00:05:14,600 --> 00:05:18,320
this has no impact on the dependencies this project needs,

112
00:05:18,320 --> 00:05:20,437
and therefore in nodes world,

113
00:05:20,437 --> 00:05:24,410
npm install does not need to be re-executed.

114
00:05:24,410 --> 00:05:26,600
And here we have our first

115
00:05:26,600 --> 00:05:30,763
tiny bit of optimization potential for this Dockerfile.

116
00:05:31,700 --> 00:05:34,040
Instead of copying everything like this,

117
00:05:34,040 --> 00:05:36,610
and then running npm install,

118
00:05:36,610 --> 00:05:40,090
it would be better if we would copy

119
00:05:40,090 --> 00:05:43,360
this after npm install,

120
00:05:43,360 --> 00:05:45,630
but before we run npm install,

121
00:05:45,630 --> 00:05:49,030
we also copy the package.json file,

122
00:05:49,030 --> 00:05:52,313
and we copy that into the app folder.

123
00:05:54,100 --> 00:05:57,720
With that, we would pick up this package.json file,

124
00:05:57,720 --> 00:05:59,810
copy that into the app folder,

125
00:05:59,810 --> 00:06:01,920
then run npm install,

126
00:06:01,920 --> 00:06:05,680
and then copy over our other code.

127
00:06:05,680 --> 00:06:08,917
With this, we would ensure that this layer,

128
00:06:08,917 --> 00:06:13,550
the npm install layer comes before we copy our source code.

129
00:06:13,550 --> 00:06:14,850
So in the future,

130
00:06:14,850 --> 00:06:17,440
whenever we change our source code,

131
00:06:17,440 --> 00:06:21,470
these layers in front of the copies source code command

132
00:06:21,470 --> 00:06:23,500
will not be invalidated.

133
00:06:23,500 --> 00:06:26,160
An npm install will not run again,

134
00:06:26,160 --> 00:06:29,203
just because we copied in our source code again.

135
00:06:30,100 --> 00:06:33,110
So now only these layers would run again

136
00:06:33,110 --> 00:06:35,270
and that will be more performant

137
00:06:35,270 --> 00:06:37,120
than running npm install again,

138
00:06:37,120 --> 00:06:40,310
which simply takes a certain amount of time to finish.

139
00:06:40,310 --> 00:06:42,770
I hope this makes sense.

140
00:06:42,770 --> 00:06:44,513
So if I now build this again,

141
00:06:45,430 --> 00:06:46,580
for the first time,

142
00:06:46,580 --> 00:06:50,090
it will run npm install and copy in everything,

143
00:06:50,090 --> 00:06:53,113
but then here we got our image name,

144
00:06:54,460 --> 00:06:57,433
and if we now use that to run our container,

145
00:06:58,700 --> 00:06:59,533
and we reload,

146
00:06:59,533 --> 00:07:01,670
we see this change in source code of course,

147
00:07:01,670 --> 00:07:03,660
which I made before.

148
00:07:03,660 --> 00:07:08,660
But if we now stop this container, first of all,

149
00:07:08,700 --> 00:07:11,040
with Docker stop,

150
00:07:11,040 --> 00:07:15,100
and then go to server js and update the source code again

151
00:07:15,100 --> 00:07:17,713
to remove all the exclamation marks again.

152
00:07:18,960 --> 00:07:22,110
You will notice that if I rebuild the image

153
00:07:22,110 --> 00:07:23,840
with Docker build dot,

154
00:07:23,840 --> 00:07:26,290
it's now again super fast

155
00:07:26,290 --> 00:07:28,160
because it was able

156
00:07:28,160 --> 00:07:32,400
to use the cached result from npm install.

157
00:07:32,400 --> 00:07:36,000
Because the steps prior to npm install didn't change

158
00:07:36,000 --> 00:07:39,130
because Docker sees that the package.json file

159
00:07:39,130 --> 00:07:41,410
did not change it's the same as before,

160
00:07:41,410 --> 00:07:43,600
and therefore there was no need

161
00:07:43,600 --> 00:07:47,890
to copy that again and to run npm install again.

162
00:07:47,890 --> 00:07:50,580
The only change happened in this step,

163
00:07:50,580 --> 00:07:53,520
but that comes after npm install.

164
00:07:53,520 --> 00:07:56,350
So that's the first small optimization,

165
00:07:56,350 --> 00:07:59,460
but more important than that optimization

166
00:07:59,460 --> 00:08:00,690
is that you understand

167
00:08:00,690 --> 00:08:02,650
why we are doing it

168
00:08:02,650 --> 00:08:06,000
and that you understand this layer based approach,

169
00:08:06,000 --> 00:08:08,320
this layer based architecture.

170
00:08:08,320 --> 00:08:10,030
It's really important because

171
00:08:10,030 --> 00:08:14,360
it's a core concept in Docker and Docker images,

172
00:08:14,360 --> 00:08:18,453
and it exists for the reasons outlined in the last minutes.

