WEBVTT

0
00:00.650 --> 00:01.060
Okay.

1
00:01.070 --> 00:03.920
As promised, let's get into URL encoding.

2
00:03.920 --> 00:10.730
And before I get into the nitty-gritty, let's just take a step back and understand why it's necessary

3
00:11.330 --> 00:12.200
in the first place.

4
00:12.200 --> 00:15.490
We know that computers can only deal in numbers.

5
00:15.500 --> 00:18.350
In fact, they can only deal in electrical impulses.

6
00:18.350 --> 00:20.390
Something's either there, or it's not.

7
00:20.690 --> 00:22.400
In other words, it's binary.

8
00:22.400 --> 00:25.520
At the end of the day, it's a 1 or a 0.

9
00:25.670 --> 00:31.160
But we know in our language itself, there are a lot more than just two binary values.

10
00:31.160 --> 00:37.250
And of course, we have a combination of, call-it, numbers to represent letters, characters.

11
00:37.760 --> 00:40.790
So if that's the case, then let's look at an example.

12
00:40.790 --> 00:49.430
Let's say that my computer uses the following character map. For A, B, and C, it assigns the numbers

13
00:49.430 --> 00:51.110
1, 2, 3 and 4.

14
00:51.710 --> 00:52.820
Pretty simple.

15
00:52.910 --> 00:59.670
But now let's say your computer uses a different character map. For the letter A, it assigns the number

16
00:59.670 --> 01:06.240
5, for B the number 6, for C the number 7, and D is represented by the number 8.

17
01:06.630 --> 01:08.430
Why would this pose a problem?

18
01:08.430 --> 01:11.070
Well, let's say I wanted to send you the message,

19
01:11.070 --> 01:11.640
"Hi".

20
01:11.670 --> 01:17.640
Then the numbers, according to my character set would be 8 and 9, and those numbers would whiz

21
01:17.640 --> 01:19.680
across the wire to your machine.

22
01:19.680 --> 01:27.300
But for your computer, 8 or 9 will represent the letters D and E - it won't represent H, and I like

23
01:27.300 --> 01:27.900
mine.

24
01:27.900 --> 01:36.150
So your machine's going to decode the message to DE ... more like, heh 🤷‍♀️??? So can you see the problem we have

25
01:36.150 --> 01:36.810
with this?

26
01:36.840 --> 01:37.640
I'm sure you can.

27
01:37.650 --> 01:40.440
It just means that two computers won't be able to speak to each other.

28
01:40.910 --> 01:48.350
So to communicate effectively, we need a standard way of encoding characters to numbers.

29
01:48.470 --> 01:50.000
And of course, we've looked at a few.

30
01:50.030 --> 01:54.470
We've looked at ASCII, ISO, UTF-8, etc etc.

31
01:54.500 --> 01:58.190
These are just different encoding types that have been developed over the years.

32
01:58.400 --> 01:58.720
"Okay

33
01:58.730 --> 02:01.580
Clyde, but this course isn't about encoding, it's about forms."

34
02:01.760 --> 02:02.270
Okay.

35
02:02.270 --> 02:06.470
Well, let's talk about encoding in the context of forms.

36
02:06.470 --> 02:13.730
We know that when data in an HTML form is submitted, the names and values are sent to the server using

37
02:13.730 --> 02:16.790
either the GET or the POST method.

38
02:16.820 --> 02:18.970
By default, it's the GET method.

39
02:18.980 --> 02:24.740
But anyway, let's look at a very simple form, just asking for a user's name and their password.

40
02:24.740 --> 02:30.140
And one of my courses wouldn't be fulfilled if I didn't incorporate Wally into everything.

41
02:30.140 --> 02:31.160
So here we go.

42
02:31.160 --> 02:35.990
Let's say the name of the user is Wally Warthog, and his password is secret1.

43
02:35.990 --> 02:39.170
And as I mentioned, the default method on a form is GET.

44
02:39.170 --> 02:44.610
So let's just assume for now, because we're talking about URL encoding, that we're using a GET method

45
02:44.610 --> 02:45.420
on our form.

46
02:45.420 --> 02:48.000
What's going to happen now when the user clicks submit?

47
02:48.030 --> 02:48.750
Well, that's right.

48
02:48.750 --> 02:57.480
We're going to get this very long URL string. And I guess those characters, the actual W A L L Y, for example,

49
02:57.480 --> 02:58.350
that makes sense.

50
02:58.350 --> 03:00.240
That's also in the URL.

51
03:00.270 --> 03:05.940
But what's useful and what's interesting to us developers is what are all these random characters?

52
03:05.970 --> 03:11.790
The ? = +, the & ... what is all this and why are they there?

53
03:12.030 --> 03:20.570
Well, my dear students, in order to understand this, you need to know a little bit about URLs themselves.

54
03:20.580 --> 03:22.380
So let's look at a URL.

55
03:23.010 --> 03:25.770
Let's just take a very simple URL.

56
03:27.560 --> 03:30.050
Nothing too crazy about that, is there?

57
03:30.140 --> 03:31.950
But let's break it up.

58
03:31.970 --> 03:34.250
In fact, we can break it up into four parts.

59
03:34.670 --> 03:36.290
Can you see which four parts?

60
03:37.830 --> 03:41.520
Well, firstly, we've got what's known as the schema.

61
03:41.640 --> 03:47.370
It always comes before the colon and the two forward slashes and it tells the web client, aka, it

62
03:47.370 --> 03:51.030
tells the web browser, how to access the resource.

63
03:51.420 --> 03:56.100
In this case, it's telling the web client to use the HyperText Transfer Protocol.

64
03:56.100 --> 04:01.950
In other words, it's telling the browser to use HTTP to make the request for the resource.

65
04:01.950 --> 04:08.100
But we don't only have HTTP. We have other protocols, we have other schemes like FTP, Mailto or

66
04:08.100 --> 04:12.570
GIT, but the ones we mostly use is HTTP.

67
04:12.810 --> 04:14.730
So that's the first part of this URL.

68
04:14.760 --> 04:20.820
The second part is this "www.foo.com" - that's referred to as the host.

69
04:20.820 --> 04:26.370
And you can think of this as just telling the browser where the resource is hosted or located.

70
04:26.400 --> 04:30.960
The next part of the URL is what's known as the path.

71
04:30.960 --> 04:35.460
And the path of the URL is optional, and it basically just digs deeper.

72
04:35.460 --> 04:40.260
It's basically telling the browser what local resource is being requested.

73
04:40.270 --> 04:44.830
And then finally we've got this funky thing on the end that begins with that question mark.

74
04:44.830 --> 04:45.760
What is that?

75
04:45.760 --> 04:47.830
That's referred to as the query string.

76
04:47.830 --> 04:51.220
We're going to be talking more about this later and you'll see it in some of our code.

77
04:51.430 --> 04:58.030
The query string is just made up of query parameters and it's used to send data to the server.

78
04:58.030 --> 05:00.340
And of course when you're not sending data, you don't need it.

79
05:00.340 --> 05:03.580
So this part of the URL is also optional.

80
05:03.850 --> 05:04.210
Okay.

81
05:04.210 --> 05:07.360
So that's a very high level overview of a URL.

82
05:07.360 --> 05:13.630
And as I mentioned in the previous lecture, just as there are specifications for CSS, HTML and JavaScript,

83
05:13.660 --> 05:18.520
there are also specs 📜 for working with URIs or URLs.

84
05:18.520 --> 05:24.280
And one of the major, major specs that we're going to be looking at is the one specified by the Internet

85
05:24.280 --> 05:32.860
Engineering Task Force in a document called RFC 3986, and also in fact 3987.

86
05:33.040 --> 05:34.810
I don't want to jump ahead of myself.

87
05:34.810 --> 05:38.410
We're going to be talking in more detail about these specs shortly.

88
05:38.410 --> 05:46.060
But for now, just know that URLs are designed to only accept certain characters, not all characters.

89
05:46.090 --> 05:48.880
Well, if they can only accept certain characters, what are they?

90
05:48.910 --> 05:54.280
Well, historically, only US-ASCII characters were allowed to be in the URL.

91
05:54.430 --> 06:01.330
But this poses a massive problem because often a URL contains characters outside of the ASCII character

92
06:01.330 --> 06:07.990
set, and sometimes there are going to be reserved characters that are being used. Reserved characters

93
06:07.990 --> 06:10.930
like spaces and tabs, etc etc.

94
06:10.930 --> 06:18.520
And when these characters are used, the browser needs a way to convert them to valid ASCII format in

95
06:18.520 --> 06:22.990
the URL itself, because the spec says we only allow certain characters.

96
06:23.080 --> 06:31.450
And my dear students, this is exactly what URL encoding aka percent encoding, is all about.

97
06:31.750 --> 06:39.670
It's the process of converting characters of a URL so that they can be safely transmitted over the internet.

98
06:39.790 --> 06:42.030
So don't get lost in all the detail.

99
06:42.040 --> 06:49.060
This lecture, kind-of, just was very high level, showing you why we need to have encoding in the first

100
06:49.060 --> 06:53.440
place, because we need a standard way of machines talking to each other over the web.

101
06:53.740 --> 06:58.090
And we've got these specs that define how a URL should be constructed.

102
06:58.120 --> 06:59.840
Lump all of those together, and

103
06:59.860 --> 07:05.950
we've got URL encoding to ensure that all the rules are being followed.

104
07:05.980 --> 07:07.540
So that's what it's all about.

105
07:07.540 --> 07:09.430
But enough of introductions.

106
07:09.430 --> 07:11.170
Let's get into the meat.