1
00:00:02,120 --> 00:00:04,950
So now we learned about Kubernetes volumes,

2
00:00:04,950 --> 00:00:06,220
and how we can use them

3
00:00:06,220 --> 00:00:08,530
and that there are different types of volumes.

4
00:00:08,530 --> 00:00:12,770
And we saw the hostPath and the emptyDir type

5
00:00:12,770 --> 00:00:14,080
in action already.

6
00:00:14,080 --> 00:00:17,670
I also briefly talked about the CSI type.

7
00:00:17,670 --> 00:00:21,270
Now these volumes have one disadvantage.

8
00:00:21,270 --> 00:00:24,330
They are destroyed when a Pod is removed

9
00:00:24,330 --> 00:00:26,710
when a Pod is terminated and replaced

10
00:00:26,710 --> 00:00:29,140
by a new Pod for example.

11
00:00:29,140 --> 00:00:31,400
Or if you scale your Pods,

12
00:00:31,400 --> 00:00:34,950
and you go from one Pod to two Pods

13
00:00:34,950 --> 00:00:38,630
then depending on the type of volume you're using,

14
00:00:38,630 --> 00:00:41,680
the new Pod might not have access to the data

15
00:00:41,680 --> 00:00:43,650
written by the first Pod.

16
00:00:43,650 --> 00:00:46,440
For example, when using the emptyDir.

17
00:00:46,440 --> 00:00:48,820
Now, when you use the hostPath type

18
00:00:48,820 --> 00:00:51,370
this problem was kind of solved

19
00:00:51,370 --> 00:00:53,780
but it is super important to keep in mind,

20
00:00:53,780 --> 00:00:57,970
that this solution only works here with minikube,

21
00:00:57,970 --> 00:01:01,540
because it's a One-Node environment.

22
00:01:01,540 --> 00:01:05,349
All the Pods always run on the same worker node

23
00:01:05,349 --> 00:01:09,770
because with minikube, we only have one worker Node.

24
00:01:09,770 --> 00:01:13,690
Now of course, once you move from your local environment

25
00:01:13,690 --> 00:01:18,690
and from minikube to a real deployment and a real cluster,

26
00:01:18,850 --> 00:01:22,970
for example, on AWS, you will have multiple Nodes

27
00:01:22,970 --> 00:01:26,830
and then hostPath also won't help you anymore.

28
00:01:26,830 --> 00:01:29,220
Then you will really have the problem

29
00:01:29,220 --> 00:01:32,060
that volumes are attached to Pods

30
00:01:32,060 --> 00:01:35,950
and therefore, multiple Pods might not share the same data.

31
00:01:35,950 --> 00:01:39,200
And when a Pod is destroyed or replaced

32
00:01:39,200 --> 00:01:42,103
the data stored in the volume, will be lost.

33
00:01:43,120 --> 00:01:45,560
But of course, sometimes you therefore need

34
00:01:45,560 --> 00:01:48,740
Pod and node independent volumes.

35
00:01:48,740 --> 00:01:50,690
For example, if you had a container

36
00:01:50,690 --> 00:01:52,310
with a database in there

37
00:01:52,310 --> 00:01:54,530
or a container writing files

38
00:01:54,530 --> 00:01:57,440
which should survive Pod replacement.

39
00:01:57,440 --> 00:02:00,290
So it's not always great, if you lose your data,

40
00:02:00,290 --> 00:02:01,960
once a Pod is removed.

41
00:02:01,960 --> 00:02:05,810
For some data, especially intermediate results

42
00:02:05,810 --> 00:02:08,639
and temporary data, that might be fine.

43
00:02:08,639 --> 00:02:10,550
But for a long-term data,

44
00:02:10,550 --> 00:02:13,470
like the key data, your application generates

45
00:02:13,470 --> 00:02:15,620
you of course don't wanna lose it

46
00:02:15,620 --> 00:02:20,620
just because a Pod was scaled up or replaced.

47
00:02:20,620 --> 00:02:24,150
And therefore, Kubernetes also has a solution for that.

48
00:02:24,150 --> 00:02:27,920
Besides the normal volumes, which we saw up to this point,

49
00:02:27,920 --> 00:02:32,920
Kubernetes also has this concept of persistent volumes.

50
00:02:33,190 --> 00:02:36,450
And as the name implies, the difference here is,

51
00:02:36,450 --> 00:02:39,170
that they will always persist.

52
00:02:39,170 --> 00:02:42,993
They will be Pod and Node independent.

53
00:02:44,430 --> 00:02:47,080
Now, of course you might argue

54
00:02:47,080 --> 00:02:50,320
that a lot of the volume types we can use.

55
00:02:50,320 --> 00:02:54,710
Like AWSElasticBlockStore, AzureDisc, AzureFile, NFS

56
00:02:55,940 --> 00:03:00,230
and a lot of other options here, indeed in the end give us

57
00:03:00,230 --> 00:03:03,130
Pod and Node independent storage.

58
00:03:03,130 --> 00:03:05,650
Because of all the species still set up a volume

59
00:03:05,650 --> 00:03:08,340
when setting up a Pod, after all

60
00:03:08,340 --> 00:03:13,020
we add our volume definition to the deployment YAML file

61
00:03:13,020 --> 00:03:15,930
or to the Pod templates to be precise.

62
00:03:15,930 --> 00:03:19,030
Whilst we do that, of course by nature,

63
00:03:19,030 --> 00:03:22,670
a lot of the solutions here actually store the data

64
00:03:22,670 --> 00:03:27,240
outside of the Pod or node our application is running on.

65
00:03:27,240 --> 00:03:29,800
If we use awsElasticBlockStore,

66
00:03:29,800 --> 00:03:33,010
our data is getting stored on AWS servers

67
00:03:33,010 --> 00:03:35,180
and it's not getting removed there

68
00:03:35,180 --> 00:03:37,190
just because a Pod shuts down.

69
00:03:37,190 --> 00:03:39,590
You actually find this here as well

70
00:03:39,590 --> 00:03:42,690
unlike emptyDir, which is erased when a Pod is removed,

71
00:03:42,690 --> 00:03:44,803
the data here persists.

72
00:03:46,030 --> 00:03:49,610
So why do we then have this our type of volume,

73
00:03:49,610 --> 00:03:53,640
the persistent volume, if for regular volumes

74
00:03:53,640 --> 00:03:56,390
we also can use volume types

75
00:03:56,390 --> 00:04:00,070
which give us this Node and Pod independence.

76
00:04:00,070 --> 00:04:02,840
Well, as you will see over the next lectures,

77
00:04:02,840 --> 00:04:04,790
the persistent volume concept

78
00:04:04,790 --> 00:04:08,320
is more than just independent storage.

79
00:04:08,320 --> 00:04:10,630
The key idea however is,

80
00:04:10,630 --> 00:04:14,450
that the volume will be detached from the Pod.

81
00:04:14,450 --> 00:04:17,019
And that includes a total detachment

82
00:04:17,019 --> 00:04:19,420
from the Pod life cycle.

83
00:04:19,420 --> 00:04:22,010
Instead with persistent volumes,

84
00:04:22,010 --> 00:04:25,230
we will have that Pod and Node independence

85
00:04:25,230 --> 00:04:28,520
and as a cluster administrator

86
00:04:28,520 --> 00:04:33,450
we will have full power over how this volume is configured.

87
00:04:33,450 --> 00:04:36,190
We don't need to configure it multiple times

88
00:04:36,190 --> 00:04:39,360
for different Pods and indifference deployment

89
00:04:39,360 --> 00:04:41,580
YAML files or anything like that.

90
00:04:41,580 --> 00:04:44,770
Instead, we'll be able to define it once

91
00:04:44,770 --> 00:04:48,140
and then use it in multiple Pods if you want to.

92
00:04:48,140 --> 00:04:51,900
So persistent volumes are built around the idea

93
00:04:51,900 --> 00:04:54,890
of Pod and Node independence.

94
00:04:54,890 --> 00:04:58,700
And that helps us when it comes to how data is stored,

95
00:04:58,700 --> 00:05:02,780
that it's not lost if a Pod is destroyed and recreated

96
00:05:02,780 --> 00:05:06,380
but that also helps us with defining volumes

97
00:05:06,380 --> 00:05:10,170
independent from Pods, defining them in a central place

98
00:05:10,170 --> 00:05:13,300
and then using volumes and different Pods

99
00:05:13,300 --> 00:05:16,960
without editing multiple Pod YAML files.

100
00:05:16,960 --> 00:05:19,890
Which for bigger deployments, as you can imagine,

101
00:05:19,890 --> 00:05:21,670
can be very cumbersome

102
00:05:21,670 --> 00:05:24,890
and which might also not give us all the control

103
00:05:24,890 --> 00:05:28,700
we as a cluster administrator might want.

104
00:05:28,700 --> 00:05:32,520
That is where persistent volumes can help us.

105
00:05:32,520 --> 00:05:35,230
So how do persistent volumes work

106
00:05:35,230 --> 00:05:37,170
and how do they differ

107
00:05:37,170 --> 00:05:40,840
if we compare them to regular volumes?

108
00:05:40,840 --> 00:05:43,520
Well, if we have a cluster with multiple Nodes

109
00:05:43,520 --> 00:05:45,260
and different Pods,

110
00:05:45,260 --> 00:05:47,990
either different instances of the same Pod

111
00:05:47,990 --> 00:05:50,450
or different Pods with different containers

112
00:05:50,450 --> 00:05:52,550
which are running on these Nodes.

113
00:05:52,550 --> 00:05:54,240
And you'll learn that volumes

114
00:05:54,240 --> 00:05:56,830
would be inside of these Pods.

115
00:05:56,830 --> 00:05:58,980
Now with persistent volumes,

116
00:05:58,980 --> 00:06:02,140
the idea is that you have new resources,

117
00:06:02,140 --> 00:06:05,410
new entities in your cluster.

118
00:06:05,410 --> 00:06:09,713
Which are detached from your Nodes and from your Pods.

119
00:06:10,570 --> 00:06:15,570
Instead you can create so-called persistent volume claims.

120
00:06:15,930 --> 00:06:17,780
These will be part of Pods

121
00:06:17,780 --> 00:06:20,520
but I added the next to the Pod here

122
00:06:20,520 --> 00:06:24,160
to just show that these claims belong to the Pods

123
00:06:24,160 --> 00:06:26,310
and the Nodes on which the Pods run.

124
00:06:26,310 --> 00:06:29,960
And these claims could reach out to these standalone

125
00:06:29,960 --> 00:06:33,090
Pod and Node independent entities,

126
00:06:33,090 --> 00:06:37,630
the persistent volumes to request access to them.

127
00:06:37,630 --> 00:06:40,640
So that for example, the container running in the Pod

128
00:06:40,640 --> 00:06:43,360
is able to write into this volume.

129
00:06:43,360 --> 00:06:46,280
And of course you can have claims to multiple

130
00:06:46,280 --> 00:06:48,600
different persistent volumes.

131
00:06:48,600 --> 00:06:51,610
And you can have different claims to different volumes

132
00:06:51,610 --> 00:06:54,770
on different Pods on different Nodes

133
00:06:54,770 --> 00:06:57,690
so you got full flexibility here.

134
00:06:57,690 --> 00:07:01,590
And the idea is that these persistent volumes, of course,

135
00:07:01,590 --> 00:07:04,720
don't store data on one of these Nodes,

136
00:07:04,720 --> 00:07:07,963
but instead they are really independent from the Node.

137
00:07:08,940 --> 00:07:11,880
And therefore, for example, in the official docs

138
00:07:11,880 --> 00:07:15,580
if you have a look at the persistent volume documentation

139
00:07:15,580 --> 00:07:17,200
you can learn more about them.

140
00:07:17,200 --> 00:07:20,130
But the most important thing there can be found

141
00:07:20,130 --> 00:07:24,150
if you have a look at the types of persistent volumes.

142
00:07:24,150 --> 00:07:27,380
And there you will see that the types here

143
00:07:27,380 --> 00:07:31,250
are kind of similar to the types we saw before.

144
00:07:31,250 --> 00:07:35,850
But for example, the emptyDir is missing.

145
00:07:35,850 --> 00:07:37,750
And hostPath is there,

146
00:07:37,750 --> 00:07:40,460
but you see one important restriction.

147
00:07:40,460 --> 00:07:42,200
It's really only a available

148
00:07:42,200 --> 00:07:44,620
if you have a single Node setup,

149
00:07:44,620 --> 00:07:47,410
like our local minikube setup.

150
00:07:47,410 --> 00:07:51,410
So in reality, on a real deployment on the real cluster

151
00:07:51,410 --> 00:07:55,730
with multiple Nodes, hostPath, all wouldn't be available.

152
00:07:55,730 --> 00:07:57,670
And of course it's not available

153
00:07:57,670 --> 00:08:02,290
because the entire idea behind persistent volumes

154
00:08:02,290 --> 00:08:06,520
is that it's detached from your Nodes and Pods.

155
00:08:06,520 --> 00:08:09,690
Therefore you can, for example, use cloud storage

156
00:08:09,690 --> 00:08:14,690
like AWSElasticBlockStore, or AzureFile, AzureDisc,

157
00:08:15,060 --> 00:08:19,370
or again, this very flexible CSI type

158
00:08:19,370 --> 00:08:23,720
to attach any kind of storage to your cluster.

159
00:08:23,720 --> 00:08:25,550
But the key thing is that the storage

160
00:08:25,550 --> 00:08:27,770
will not be on the cluster Nodes

161
00:08:27,770 --> 00:08:30,380
it will be somewhere else, for example,

162
00:08:30,380 --> 00:08:32,679
some cloud storage service.

163
00:08:32,679 --> 00:08:35,720
And again, we'll see the CSI type in action later

164
00:08:35,720 --> 00:08:39,210
once we really deploy our application.

165
00:08:39,210 --> 00:08:43,030
But we can right away get started with persistent volumes

166
00:08:43,030 --> 00:08:44,900
to understand how they work

167
00:08:44,900 --> 00:08:48,330
and how the claim and the volume work together

168
00:08:48,330 --> 00:08:50,880
by again, using hostPath.

169
00:08:50,880 --> 00:08:53,420
Again, it's only here for a testing

170
00:08:53,420 --> 00:08:56,040
and for this local dummy environment

171
00:08:56,040 --> 00:08:57,840
which uses only one Node,

172
00:08:57,840 --> 00:08:59,820
but the idea is the same

173
00:08:59,820 --> 00:09:02,270
as if you would use any other type.

174
00:09:02,270 --> 00:09:04,630
So it's perfect for getting started

175
00:09:04,630 --> 00:09:06,273
with persistent volumes.

