1
00:00:11,060 --> 00:00:15,890
So in this lecture, we'll continue our discussion on how the techs rank summariser works.

2
00:00:16,520 --> 00:00:21,170
This lecture will be a continuation of the previous lecture, so make sure you've watched that first.

3
00:00:22,710 --> 00:00:28,230
In particular, this lecture will go through the mechanics of how PageRank works, which can then obviously

4
00:00:28,230 --> 00:00:30,000
be applied to text rank as well.

5
00:00:30,810 --> 00:00:35,850
Now please note that in order to go through this subject properly, it really takes half, of course,

6
00:00:35,850 --> 00:00:37,170
to a full course of time.

7
00:00:37,800 --> 00:00:41,340
So this lecture is really a compressed version of what I've taught in the past.

8
00:00:42,450 --> 00:00:46,860
Now, this doesn't mean I've left out any details, but it does mean that I'm going to go faster than

9
00:00:46,860 --> 00:00:47,370
usual.

10
00:00:48,150 --> 00:00:53,460
So after this lecture, if you still have any questions or if you need more details, please email me

11
00:00:53,460 --> 00:00:55,650
on my website using the contact form.

12
00:01:00,380 --> 00:01:05,239
OK, so to begin our discussion, we'll note that a random walk is an example of a Markov chain.

13
00:01:06,050 --> 00:01:10,730
In fact, this is very similar to the Markov models we discussed elsewhere in this course.

14
00:01:11,480 --> 00:01:15,080
Generally speaking, Markov chains are made up of a set of states.

15
00:01:15,590 --> 00:01:20,550
We can then go from any state at time T to any other state at time T plus one.

16
00:01:20,900 --> 00:01:26,090
According to some given set of probabilities, suppose that we have states in total.

17
00:01:26,750 --> 00:01:30,080
In the case of PageRank, this means that we have M Web pages.

18
00:01:31,440 --> 00:01:38,430
The probability of going from state eye at time T to state at time T plus one is written like what you

19
00:01:38,430 --> 00:01:39,120
see here.

20
00:01:40,020 --> 00:01:42,840
Note that these transitions follow the mark of property.

21
00:01:43,170 --> 00:01:48,570
Since this probability only depends on where you were at the previous state, but not any state before

22
00:01:48,570 --> 00:01:49,050
that.

23
00:01:51,150 --> 00:01:57,420
Furthermore, note that we can store these probabilities in a matrix since I can be any state from one

24
00:01:57,420 --> 00:02:00,840
up to M and J can be any state from one up to M..

25
00:02:01,170 --> 00:02:05,820
This is an m by M matrix, which we simply denote as a bi convention.

26
00:02:06,480 --> 00:02:11,310
So AJ is the probability of going from state to state J.

27
00:02:11,940 --> 00:02:14,880
Note that we call a the state transition matrix.

28
00:02:15,690 --> 00:02:19,860
Also note that a is independent of time, which is why it does not depend on T.

29
00:02:24,530 --> 00:02:29,630
The next step is to consider how we represent the probability of being in a particular state.

30
00:02:30,140 --> 00:02:37,640
We call this the state distribution, so possessive tea will be a vector of improbability values, telling

31
00:02:37,640 --> 00:02:43,940
us the probability of being in any particular state at time, T noted by convention.

32
00:02:43,940 --> 00:02:49,070
We think of pervasive T as a rogue vector instead of a column vector like we normally do.

33
00:02:53,810 --> 00:03:00,020
Importantly, we can use this along with the Matrix to compute the state distribution at the next time

34
00:03:00,020 --> 00:03:07,640
step in particular, a p of S of T plus one is simply pervasive of T multiplied by the matrix.

35
00:03:08,330 --> 00:03:12,860
As an exercise, you can do a simple example yourself to prove that this works.

36
00:03:17,350 --> 00:03:22,630
Now, as you recall, what we would like to do is compute the state distribution after an infinitely

37
00:03:22,630 --> 00:03:27,940
long random walk that is we want to know of as of T one T is infinity.

38
00:03:28,540 --> 00:03:30,520
We call this the limiting distribution.

39
00:03:32,160 --> 00:03:37,470
Now, in general, this would be very hard to compute since it could take an infinite amount of steps.

40
00:03:38,040 --> 00:03:43,320
In fact, there's no guarantee it will actually converge such that the limiting distribution actually

41
00:03:43,320 --> 00:03:43,980
exists.

42
00:03:44,700 --> 00:03:49,740
But luckily, there's a trick from linear algebra, which we can use, which not only guarantees that

43
00:03:49,740 --> 00:03:55,050
this distribution does exist, but also that we won't need to take an infinite number of steps to find

44
00:03:55,050 --> 00:03:55,380
it.

45
00:04:00,050 --> 00:04:06,320
Let's start by recognizing one fact if we take our state transition equation, poverty of two plus one

46
00:04:06,320 --> 00:04:11,000
equals press of tee times a we can simply plug in T is equal to infinity.

47
00:04:11,870 --> 00:04:13,280
What we get is p of us.

48
00:04:13,280 --> 00:04:15,980
Infinity equals p of s infinity times.

49
00:04:16,910 --> 00:04:19,910
This is because infinity plus one is still infinity.

50
00:04:21,510 --> 00:04:22,470
Now you might think.

51
00:04:22,650 --> 00:04:25,920
Doesn't this simply mean a is one which is a contradiction?

52
00:04:26,580 --> 00:04:31,110
And the answer is no, because we are working with matrices instead.

53
00:04:31,140 --> 00:04:33,840
This is what is known as the eigenvalue problem.

54
00:04:34,650 --> 00:04:40,830
As you recall, the eigenvalue problem is typically stated as eight times v equals lambda times v,

55
00:04:40,830 --> 00:04:45,570
where a is a matrix V is an eigen vector in lambda is an eigenvalue.

56
00:04:46,410 --> 00:04:53,130
It means that eigenvectors are special vectors such that when you multiply them by the matrix a, this

57
00:04:53,130 --> 00:04:57,690
is the same as simply multiplying by scalar, which is the eigenvalue lambda.

58
00:04:59,690 --> 00:05:05,960
For our problem, this simply means that Pervez Inf. is the eigenvectors, while the eigenvalue is just

59
00:05:05,960 --> 00:05:06,440
one.

60
00:05:07,700 --> 00:05:12,860
And recall that there's no need to discuss how we actually find eigenvalues and eigenvectors, since

61
00:05:12,860 --> 00:05:15,560
that's a low level operation, included a nampai.

62
00:05:20,230 --> 00:05:25,810
Now, what I just showed you was kind of a sleight of hand, as you recall, what we originally wanted

63
00:05:25,810 --> 00:05:32,140
to find was the limiting distribution that is the distribution we would get if we kept multiplying by

64
00:05:32,140 --> 00:05:34,510
a an infinite number of times.

65
00:05:35,170 --> 00:05:40,420
What we actually found is called the stationary distribution, which is not necessarily the same.

66
00:05:41,320 --> 00:05:46,000
The stationary distribution is simply a distribution that doesn't change after a transition.

67
00:05:46,870 --> 00:05:48,970
So here are some questions to consider.

68
00:05:49,840 --> 00:05:54,700
How do we know that the stationary distribution is the same as the limiting distribution?

69
00:05:55,780 --> 00:05:59,080
Furthermore, how do we know that this distribution is unique?

70
00:05:59,740 --> 00:06:04,540
And finally, how do we know that the corresponding eigenvalue is one to begin with?

71
00:06:09,210 --> 00:06:11,540
The answer is called the parent for a zero.

72
00:06:12,360 --> 00:06:19,350
In short, the states that if the Matrix A is a mark of matrix, also known as a stochastic matrix and

73
00:06:19,350 --> 00:06:25,710
we can travel from any state to any other state with positive probability, then all the previous assumptions

74
00:06:25,710 --> 00:06:27,030
we made will be true.

75
00:06:28,890 --> 00:06:33,510
Now, luckily, you'll never have to prove this theorem, since it's more common to simply apply it.

76
00:06:34,260 --> 00:06:40,530
So to recap, there are two conditions and all we need to do is ensure that our matrix meets these conditions.

77
00:06:42,180 --> 00:06:45,180
The first condition is that a is a mark of matrix.

78
00:06:45,990 --> 00:06:50,280
This simply means that each row, some still one and all these values are not negative.

79
00:06:50,760 --> 00:06:53,790
This makes each row a valid probability distribution.

80
00:06:55,410 --> 00:06:58,680
The second condition is that all the values in a are positive.

81
00:06:59,340 --> 00:07:04,530
Normally we only have to ensure they are not negative as per condition one which allows some values

82
00:07:04,530 --> 00:07:05,220
to be zero.

83
00:07:06,060 --> 00:07:10,980
But if we want all our nice assumptions to hold, then we also have to ensure they are positive.

84
00:07:12,550 --> 00:07:14,560
OK, so these are the two conditions.

85
00:07:15,160 --> 00:07:20,050
If these conditions are true, then as mentioned, all the assumptions we made are true.

86
00:07:20,770 --> 00:07:24,730
As you recall, this means that the Matrix does have an eigenvalue of one.

87
00:07:25,120 --> 00:07:29,080
It means that the limiting distribution and the stationary distribution are the same.

88
00:07:30,220 --> 00:07:32,920
And it means that this distribution is unique.

89
00:07:33,940 --> 00:07:39,040
Furthermore, this means that it's easy to find or limiting distribution, since it simply amounts to

90
00:07:39,040 --> 00:07:43,330
finding the eigen vector for which the corresponding eigenvalue is one.

91
00:07:44,290 --> 00:07:49,450
Since we can do this, there is no need to multiply by a an infinite number of times.

92
00:07:50,020 --> 00:07:54,040
In the case of PageRank, it's very easy to ensure that these conditions are true.

93
00:07:58,880 --> 00:08:04,160
So let's discuss how we build the PageRank matrix, which will make it simpler to understand the tech's

94
00:08:04,160 --> 00:08:05,060
rank matrix.

95
00:08:06,080 --> 00:08:09,980
Let's consider a single row of The Matrix, some arbitrary row at index.

96
00:08:09,980 --> 00:08:18,080
I now suppose that there are any links on web page I, since we want to have equal probability of going

97
00:08:18,080 --> 00:08:19,160
to any web page.

98
00:08:19,460 --> 00:08:25,280
We'll make the probability for each of these outgoing links to be won over in I and of course, all

99
00:08:25,280 --> 00:08:27,020
the rest of the values will be zero.

100
00:08:27,800 --> 00:08:29,510
So hopefully this makes sense.

101
00:08:29,990 --> 00:08:30,950
We'll call this matrix.

102
00:08:32,299 --> 00:08:38,360
Now we can't only use G because as you recall, although this is a mark of matrix where each row comes

103
00:08:38,360 --> 00:08:41,929
to one, it does not contain only positive numbers.

104
00:08:42,500 --> 00:08:44,990
Currently, since this matrix contains zeros.

105
00:08:45,260 --> 00:08:49,250
It's not possible to go from any webpage to any other webpage.

106
00:08:49,790 --> 00:08:52,820
Therefore, we need to apply smoothing to this matrix.

107
00:08:57,360 --> 00:09:03,570
In other words, we're going to artificially insert a small probability of going to any web page in

108
00:09:03,570 --> 00:09:09,270
particular, what we want to do is create a new matrix called you, which contains the same value one

109
00:09:09,270 --> 00:09:11,010
over m in each position.

110
00:09:11,850 --> 00:09:16,800
This would be our matrix if every page on the internet links to every other page on the internet.

111
00:09:18,030 --> 00:09:22,500
The final step is to take a convex combination of both G and U.

112
00:09:23,220 --> 00:09:25,380
For example, zero point one five U.

113
00:09:25,500 --> 00:09:27,270
Plus zero point eighty five g.

114
00:09:28,140 --> 00:09:33,750
As an exercise, you should confirm to yourself that each row of this new matrix still comes to one.

115
00:09:38,530 --> 00:09:43,410
OK, so now that you know how the PageRank matrix is built, let's move on to tax rank.

116
00:09:44,260 --> 00:09:47,140
As you recall, we've just computed the TFI TAF.

117
00:09:47,770 --> 00:09:51,730
This gives us a tier of IDF vector for each sentence in our document.

118
00:09:53,220 --> 00:09:59,010
The next step is to compute the cosine similarity between every sentence and every other sentence.

119
00:09:59,610 --> 00:10:03,240
Note that there are other ways to compute the similarity between sentences.

120
00:10:03,720 --> 00:10:08,610
You're encouraged to check out these other methods yourself, which are described in the text rank paper,

121
00:10:08,910 --> 00:10:10,990
which you can find an extra reading dot text.

122
00:10:12,330 --> 00:10:16,470
Either way, this will give us an end by a matrix, just like with PageRank.

123
00:10:16,980 --> 00:10:21,830
So at position, we have the similarity between sentence, eye and sentence.

124
00:10:23,310 --> 00:10:28,590
However, note that it's not possible to use this as is, since each road is not sum to one.

125
00:10:29,550 --> 00:10:34,170
To fix this, we simply divide every row by some to enforce this constraint.

126
00:10:35,040 --> 00:10:38,040
As with PageRank, there can still be zeros in this matrix.

127
00:10:38,340 --> 00:10:44,430
And so we use the same method to smooth it by adding a small uniform probability for every pair of states.