WEBVTT

0
00:00.510 --> 00:06.660
I am now going to throw you a bit in the deep end. We're going to be discussing the accept-charset.

1
00:07.230 --> 00:10.530
And all this is, is it's a form attribute.

2
00:11.040 --> 00:12.480
So, it pretty much looks like this.

3
00:13.710 --> 00:17.580
And here you can see we've set the accept-charset to "utf8"...

4
00:17.580 --> 00:20.790
and I'm going to be discussing exactly what this means, shortly.

5
00:21.330 --> 00:24.660
But I want you to be aware that this is quite an advanced topic.

6
00:24.900 --> 00:29.460
And I'm not joking when I say I'm pretty much throwing you in the deep end, and it may feel like it's...

7
00:29.460 --> 00:33.230
a bit too deep for so early on in the course, but I'm here for you.

8
00:33.240 --> 00:34.080
I'm going to get it through...

9
00:34.080 --> 00:36.660
and I've tried to lay things out as simply as I can.

10
00:37.020 --> 00:38.920
And very few developers even know this.

11
00:39.030 --> 00:44.280
If you ask a developer what the accept-charset is, yes, they'll probably say, listen, it's it's a type...

12
00:44.280 --> 00:44.940
of encoding...

13
00:45.120 --> 00:48.360
but very few understand the true meaning of utf8...

14
00:48.360 --> 00:51.720
and why differs from other types, such as ASCII, for example.

15
00:52.140 --> 00:53.120
Do you know what I've just thought of?

16
00:53.130 --> 00:58.220
I just thought of that old song, um, "Sum 41 - in too deep". 

17
00:58.230 --> 00:59.110
Do you remember that song?

18
00:59.610 --> 01:00.360
Let me remind you.

19
01:01.180 --> 01:07.180
(music by Sum 41)

20
01:11.240 --> 01:15.110
Oh, brilliant... it reminds me of the old times.

21
01:15.470 --> 01:19.100
But anyway, the point is, I'm throwing you in the deep end, but don't worry, we're going to get...

22
01:19.100 --> 01:20.430
through it together. Okay, 

23
01:20.540 --> 01:21.470
where do we start?

24
01:22.010 --> 01:27.500
Well, in order to understand the accept-charset attribute, we need to have a geek 🤓 lesson, because...

25
01:27.500 --> 01:31.990
it really requires you to understand how we treat text on computers.

26
01:32.270 --> 01:32.990
So let's have a look.

27
01:33.860 --> 01:41.090
You need to understand that when we have text, and let's just literally write the word "text" to the screen,

28
01:41.780 --> 01:43.820
that's just a sequence of characters.

29
01:44.150 --> 01:51.590
And when we want to either store it inside a computer, aka. the machine, or if we want to transfer...

30
01:51.590 --> 01:57.560
it over a digital network, for example, like submitting it to a server, we need to convert this text...

31
01:57.560 --> 02:05.130
into something the machine can understand - into a binary representation, because that's the only language...

32
02:05.130 --> 02:07.550
a binary-based computer can understand.

33
02:07.550 --> 02:13.460
And you and I know that our computers, at its core, can only understand 1s and 0s. In fact, it can only...

34
02:13.460 --> 02:17.130
understand electrical inputs - either on or off.

35
02:17.690 --> 02:20.780
And how do we convert this text into binary?

36
02:21.200 --> 02:24.260
We do this by using character encoding.

37
02:24.470 --> 02:31.250
And just remember, "encoding", that specific word, just means converting data from one format to another.

38
02:31.940 --> 02:32.530
Does that makes sense?

39
02:32.750 --> 02:40.490
So "character encoding" specifically, that just means it's a way of converting text data into binary...

40
02:40.490 --> 02:41.090
numbers.

41
02:41.570 --> 02:47.480
In a nutshell, we can assign unique numeric values to specific characters and convert those numbers...

42
02:47.480 --> 02:48.770
into binary language.

43
02:49.100 --> 02:54.540
And these binary numbers can later be converted back to the original characters based on their values.

44
02:55.370 --> 02:58.160
That's all that "character encoding" means.

45
02:58.910 --> 03:01.070
And can you see that word, "charset"?

46
03:01.670 --> 03:02.260
That charset...

47
03:02.270 --> 03:05.360
is just short for "character set" and a character...

48
03:05.480 --> 03:09.470
set, is just the set of characters that are allowed to be encoded.

49
03:09.830 --> 03:15.710
As we'll see shortly, the ASCII encoding encompasses a character set of 128 characters.

50
03:15.920 --> 03:20.390
UTF-8 has a ton more. But let me not get ahead of myself.

51
03:20.930 --> 03:26.860
So moving on, we know the accept-charset attribute specifies the character encoding type, how...

52
03:26.870 --> 03:34.280
those characters are going to get encoded into binary data. The default value of accept-charset, 

53
03:34.400 --> 03:39.470
for example, when you don't even include it on your form, is the reserved string "UNKNOWN". 

54
03:39.890 --> 03:45.800
And this indicates that the encoding equals the encoding of the document containing your form element.

55
03:46.400 --> 03:51.500
Pretty simple, and most of the time you don't even need to include this attribute because by default...

56
03:51.710 --> 03:55.550
your entire HTML file is set to UTF-8. 

57
03:55.790 --> 03:56.910
But I digress...

58
03:56.930 --> 03:58.130
let me get back onto topic.

59
03:58.610 --> 04:04.850
Over the years, people have developed different ways to map characters to 0s and 1s.

60
04:05.690 --> 04:13.190
The most common, though, are these three: (1) ISO  (2) ASCII and (3) UTF-8. 

61
04:13.190 --> 04:18.140
Hang on, how do I know what the encoding type is on any given webpage that I visit?

62
04:18.680 --> 04:23.240
Well, let me just jump over to a random web page now and I'll show you how you can see it.

63
04:25.350 --> 04:27.970
OK, so here I am, I'm just in incognito 🤺.

64
04:28.170 --> 04:33.630
And as you'll see, most browsers today, as I mentioned, use UTF-8. But how do we see what the...

65
04:33.630 --> 04:34.770
encoding type is for...

66
04:34.770 --> 04:42.150
our document? Well, remember a few lectures ago, I said that many attributes are also properties on...

67
04:42.150 --> 04:42.840
their object.

68
04:43.680 --> 04:47.790
Well, the charset can be accessed directly from the document, like this.

69
04:48.710 --> 04:49.790
So, how do we do it?

70
04:49.820 --> 04:51.740
Well, it's very simple. Let's just go to the console.

71
04:53.890 --> 04:57.400
And we can just access this attribute directly as a property. 

72
04:58.060 --> 05:03.340
Remember when we were discussing attributes a bit earlier, we mentioned how important it is to know...

73
05:03.340 --> 05:09.070
what attributes are attached to elements, because those attributes in many cases are converted into...

74
05:09.070 --> 05:10.670
properties of an object.

75
05:11.140 --> 05:13.090
And this is one example why its really useful.

76
05:13.300 --> 05:14.950
All we have to do is access our "document"...

77
05:14.950 --> 05:19.000
and on this document, we literally have the charset property.

78
05:19.450 --> 05:24.070
And as we can see here, the character set on this Google page is UTF-8.

79
05:24.280 --> 05:24.800
How awesome.

80
05:24.820 --> 05:26.890
But now, let's jump back into the lecture.

81
05:27.730 --> 05:28.460
Does that make sense?

82
05:28.780 --> 05:30.390
Hope it's starting to feel more intuitive.

83
05:30.430 --> 05:31.360
It's not that difficult.

84
05:31.810 --> 05:36.850
But let's now discuss each one of these three common encoding types in a little bit more detail.

85
05:37.270 --> 05:39.040
So let's start off with ISO.

86
05:39.550 --> 05:45.670
Firstly, I just want to point out that this ISO standard is a legacy standard dating all the way back...

87
05:45.670 --> 05:48.100
to the 1980s - old-school.

88
05:48.640 --> 05:54.340
It is similar, kind of, to ASCII and it can only represent 256 characters...

89
05:54.700 --> 05:57.850
so it's only suitable for some languages in the Western world.

90
05:58.150 --> 06:03.490
But even for many supported languages, some characters are missing. If you create a text file in this encoding...

91
06:03.490 --> 06:08.320
and try copy / paste some Chinese characters, for example, you're going to see weird results 🥴.

92
06:08.620 --> 06:13.320
And because it's so old-school, I just don't want us to even discuss this anymore.

93
06:13.450 --> 06:16.540
So let's move on. The next common type...

94
06:16.540 --> 06:25.510
is this ASCII character encoding set, and this contains a total of 128 characters and each character...

95
06:25.510 --> 06:29.230
has a unique value between 0 and 127.

96
06:29.800 --> 06:36.970
Of course, if you include the 0, that's why we have 128 characters in total. And because it only...

97
06:36.970 --> 06:39.400
has 128 characters, a 7 bit...

98
06:39.400 --> 06:46.930
binary number is sufficient to represent a character from the ASCII character set, since a 7...

99
06:46.930 --> 06:51.880
bit number can hold values from 0 to 127.

100
06:52.190 --> 06:54.460
Remember, a bit is just a 0 or 1.

101
06:54.850 --> 06:55.960
That's all a bit is.

102
06:55.960 --> 07:01.330
And if we have 7 bits, we can represent 128 unique values.

103
07:01.750 --> 07:02.580
How do I know that?

104
07:02.590 --> 07:04.660
Well, it's just simple math, but here we go.

105
07:04.660 --> 07:11.260
Here's 7 bits on the screen, and you can literally have so many different combinations.

106
07:11.260 --> 07:12.250
You can have all 0s.

107
07:12.250 --> 07:13.330
You can put a 1 on the end.

108
07:13.330 --> 07:17.050
And when you go through all the different combinations, you'll eventually end up with all 1s...

109
07:17.410 --> 07:21.610
and that, my dear friends, is 128 unique values.

110
07:22.090 --> 07:24.640
I wish it could be this simple though, and we could just stop here.

111
07:25.120 --> 07:30.610
But actually what happens in the real world is that your typical computer system has memory. And memory...

112
07:30.610 --> 07:38.770
is made of unit cells, and each individual cell contains 8 bits, not 7 bits ... 8 bits. And 8...

113
07:38.770 --> 07:40.270
bits are called a "byte".

114
07:40.720 --> 07:46.720
And this means that although ASCII only needs 7 bits to encode a character, it is stored in 8 bits.

115
07:46.870 --> 07:50.110
And this just means we've got a redundant bit

116
07:50.110 --> 07:50.470
right?

117
07:50.890 --> 07:55.630
And by convention, the first bit is always 0, for that reason.

118
07:56.050 --> 08:01.930
In fact, since the first bit of an ASCII characters is always 0, it's called a dead bit. 

119
08:01.930 --> 08:03.990
Let me show you what I mean.

120
08:04.300 --> 08:04.960
So here we go.

121
08:04.960 --> 08:09.610
Here's ASCII characters, and there are the bits that represent the letters A through to F.

122
08:09.970 --> 08:13.780
Pretty simple. But you'll notice that the very first bit are dead bits. 

123
08:13.780 --> 08:20.230
They are always 0. So that's a very quick way for you and I to kind of spot the character encoding...

124
08:20.230 --> 08:20.470
type...

125
08:20.470 --> 08:25.780
whenever you see binary data like this. That's just a bit of FYI. You know what else I could show you, 

126
08:25.780 --> 08:30.970
I just thought of right now, why don't I just jump to Microsoft Word and I'll show you the ASCII character

127
08:30.970 --> 08:31.290
set.

128
08:31.670 --> 08:32.230
Let me show you.

129
08:33.130 --> 08:36.640
I love the stuff, and I hope you're having a ton of fun like me.

130
08:37.000 --> 08:41.500
But anyway, as I mentioned, I want to show you where we can see this ASCII character set.

131
08:41.980 --> 08:43.540
All you have to do is click on insert.

132
08:43.540 --> 08:45.220
And this is just Microsoft Word, by the way.

133
08:45.460 --> 08:45.840
I don't know why...

134
08:45.940 --> 08:47.350
I wanted to show you this example.

135
08:47.350 --> 08:48.400
I just think it's interesting.

136
08:49.090 --> 08:51.700
And we just can click on "Symbol", then "more symbols".

137
08:53.400 --> 08:54.210
And look at that.

138
08:55.870 --> 09:01.870
We've got Unicode here, which we don't want. I want to show you this ASCII character set. And here

139
09:01.870 --> 09:02.190
they are.

140
09:02.290 --> 09:05.830
You can literally scroll through all of them and there are not that many.

141
09:06.850 --> 09:09.670
But, look here. If I click on the last one...

142
09:09.670 --> 09:15.670
you'll see that there's a character code, and that has a number of 255.

143
09:15.670 --> 09:16.390
255. 

144
09:16.660 --> 09:22.930
And if we include the character code "0" itself, we can see that there are 256 unique values.

145
09:23.260 --> 09:24.240
Why is this 🤷🏻‍♂️?

146
09:24.400 --> 09:31.510
I've just said that the ASCII character encoding set contains 128 characters and now I've just showed...

147
09:31.510 --> 09:33.160
you that there are actually 256.

148
09:33.610 --> 09:34.810
Well, don't get confused.

149
09:34.810 --> 09:39.370
Microsoft Word uses the "extended" ASCII character set.

150
09:39.830 --> 09:43.390
Remember, we had that dead byte? That byte that's just 0.

151
09:43.690 --> 09:50.830
Well, the extended character set uses that dead byte to store other characters, and that gives us an additional...

152
09:51.190 --> 09:53.980
128 characters, which is why we get this.

153
09:53.980 --> 09:57.660
But anyway, putting that aside, just understand, this is the ASCII character set.

154
09:58.090 --> 10:02.470
I just wanted to show you what it looks like, and where you can even see some of these things, because

155
10:02.470 --> 10:04.750
it can be very daunting when you first start out.

156
10:04.940 --> 10:07.140
Enough said, let's hop back into the lecture.

157
10:07.990 --> 10:09.470
This is quite interesting, isn't it?

158
10:09.730 --> 10:13.300
Hope you're learning a ton. I hope its starting to become a "bit" (no pun intended) more intuitive now.

159
10:13.300 --> 10:18.400
The ASCII character set is just a group of characters that are part of that set.

160
10:18.700 --> 10:24.040
And very clever people in the background have mapped those characters in order to be converted into binary...

161
10:24.040 --> 10:24.460
data.

162
10:24.700 --> 10:25.260
Quite cool.

163
10:25.540 --> 10:30.700
But now the question you're probably having is, "Clyde, are 128 unique values, enough?"

164
10:31.270 --> 10:37.270
Well, let me say that they are 95 human readable characters specified in the ASCII table. Letters A through

165
10:37.270 --> 10:39.460
to Z, both uppercase and lowercase...

166
10:39.850 --> 10:46.540
the numbers 0-9, a handful of punctuation marks and characters like the $, and the ampersand...

167
10:46.540 --> 10:47.380
and a bunch of others.

168
10:47.650 --> 10:53.680
It also includes 33 values for other things like spaces, tabs, backspaces, and a few more. Notice that...

169
10:53.680 --> 10:58.870
those weird characters are not printable per say, but they're still visible in some form and useful...

170
10:58.870 --> 10:59.590
for humans.

171
10:59.890 --> 11:05.740
So, in total there are 128 characters in ASCII encoding type, which is a nice round number.

172
11:06.430 --> 11:12.070
But let's just be honest, 95 readable characters just doesn't cut it.

173
11:12.580 --> 11:16.090
There's no spec on how to represent emojis, for example 😱.

174
11:16.420 --> 11:18.040
Or what about Chinese?

175
11:18.580 --> 11:24.490
Well, if we only had the ASCII character set, we would not be allowed to ever write Chinese or include emojis...

176
11:24.490 --> 11:25.360
on a computer screen 😡.

177
11:25.660 --> 11:28.150
You can really tell how average that would be.

178
11:28.630 --> 11:34.470
So to cut a long story short, we need more! We need more characters. Whew...

179
11:34.570 --> 11:38.830
but because this lecture is already going on quite long, I want to stop here.

180
11:39.220 --> 11:44.980
And in the next lecture, I want to blow your mind by discussing what exactly UTF-8 is. 

181
11:45.190 --> 11:45.750
I can't wait.

182
11:45.880 --> 11:46.330
See you now. 

183
11:48.450 --> 11:51.670
(music)

184
11:54.650 --> 11:59.800
(music)

185
12:02.840 --> 12:08.870
(music)