WEBVTT

0
00:00.750 --> 00:05.280
I know, we're covering very advanced things, and I hope you're following along. But you know what's...

1
00:05.280 --> 00:08.300
weird, is that this doesn't really solve our problem 😥.

2
00:09.430 --> 00:13.030
The page's encoding type is still ISO-8859-1.

3
00:13.360 --> 00:18.790
This means that the theta value does not exist in that encoding type. But that does not stop the browser.

4
00:18.790 --> 00:23.680
The browser will still try and store that theta sign in a way it understands and we've seen it does...

5
00:23.680 --> 00:31.210
it, by storing it as a numeric character reference, in decimal form. That's key, in decimal form. And that...

6
00:31.210 --> 00:33.400
is, 952.
(alarm sound)

7
00:35.320 --> 00:37.180
We've got a problem...

8
00:37.180 --> 00:38.000
(sound: Houston, we have a problem)

9
00:38.350 --> 00:39.130
What is the problem?

10
00:39.880 --> 00:47.230
Server side code that processes GET and POST requests typically only understand URL encoded 

11
00:47.230 --> 00:47.950
strings.

12
00:48.910 --> 00:52.210
They rarely understand numeric character references.

13
00:52.720 --> 00:59.020
So in our example, theta is represented by the code point 952. But in Unicode

14
00:59.050 --> 01:02.100
hex form, which is what servers typically want,

15
01:02.440 --> 01:06.430
it's reference point would be 03B8.

16
01:06.910 --> 01:07.900
And what does this mean?

17
01:07.910 --> 01:14.560
This means that the servlet code that reads the name parameter of 952, it won't...

18
01:14.560 --> 01:20.260
know that the 952 represents anything other than the number...

19
01:20.260 --> 01:20.890
952. 

20
01:21.010 --> 01:21.850
How would it?

21
01:22.360 --> 01:29.680
In other words, the real character of Theta won't be discovered by the servlet code unless you intentionally...

22
01:29.680 --> 01:31.300
planned for the situation.

23
01:32.840 --> 01:33.540
Does that make sense?

24
01:34.010 --> 01:41.930
So the bottom line is that most servers will decode the string to get "&#952;".

25
01:41.930 --> 01:47.240
It won't know that that "952" is a numeric character code for the browser.

26
01:47.420 --> 01:48.140
It just won't know.

27
01:48.680 --> 01:55.310
And this means the server is going to stop here, thinking that this represents the final characters that you...

28
01:55.340 --> 01:57.510
are after, that the browser has needed.

29
01:58.190 --> 02:01.310
Well, the solution is pretty straightforward and self-explanatory.

30
02:01.340 --> 02:07.700
We should always select an encoding type for a form that can handle all the characters that we intend...

31
02:07.700 --> 02:10.040
users to be able to write. I know...

32
02:10.340 --> 02:12.200
it's a pretty obvious solution, right?

33
02:12.260 --> 02:13.760
But it is what it is.

34
02:13.790 --> 02:18.650
So if you expect to have multilingual users and hope to process multiple scripts, why don't we just...

35
02:18.650 --> 02:21.020
use a character encoding that represents those scripts?

36
02:21.020 --> 02:28.100
And utf-8, as we've discussed, is Unicode character encoding, that can correctly represent all characters...

37
02:28.100 --> 02:30.520
in active use throughout the entire world 🌍.

38
02:30.740 --> 02:35.270
And of course, we've seen that when we use utf-8 in the page instead of the ISO-8859-1...

39
02:35.270 --> 02:41.540
the browser produces an entirely different URL, one that the server can understand. I hope...

40
02:42.140 --> 02:43.400
this has made sense.

41
02:43.790 --> 02:45.910
If you are still a bit confused, don't worry too much.

42
02:45.920 --> 02:46.670
It really is...

43
02:46.850 --> 02:51.470
you know, these are very advanced concepts and we're going to be discussing more of them throughout the course.

44
02:51.950 --> 02:53.860
So, can't wait to see you in the next lecture.

45
02:54.260 --> 02:54.860
Adios 👋.