1
00:00:11,660 --> 00:00:17,010
In this lecture we are going to look at how to build a recommender system using deep learning.

2
00:00:17,030 --> 00:00:20,110
This lecture is going to walk you through a prepared code lab notebook.

3
00:00:20,330 --> 00:00:25,820
Although a very good exercise which I always recommend is once you know how this is done to try and

4
00:00:25,820 --> 00:00:29,140
recreate it yourself with as few references as possible.

5
00:00:29,510 --> 00:00:34,400
As usual you can look at the title of the notebook to determine what notebook we are currently looking

6
00:00:34,400 --> 00:00:34,610
at.

7
00:00:35,720 --> 00:00:41,850
So none of the imports are too surprising so let's start with downloading the data.

8
00:00:41,930 --> 00:00:49,160
This data is hosted at group lens dot org and we'll be looking at the famous movie lens 20 million dataset.

9
00:00:49,160 --> 00:00:54,800
Note that there are multiple versions of the movie lens dataset and most tutorials and courses use the

10
00:00:54,800 --> 00:00:58,380
version of the dataset which only has 100000 ratings.

11
00:00:58,400 --> 00:01:03,030
But that's kind of lame so we're going to demonstrate that we can do better.

12
00:01:03,320 --> 00:01:06,470
Now since this is a zip file we're going to unzip it

13
00:01:10,820 --> 00:01:13,220
so you'll see that it comes with quite a few files.

14
00:01:13,220 --> 00:01:17,930
I'd recommend looking at these since I've already done the hard work for you by figuring out which is

15
00:01:17,930 --> 00:01:18,570
which.

16
00:01:18,770 --> 00:01:24,340
But you want to do that yourself at some point since you know that's part of being a data scientist.

17
00:01:24,410 --> 00:01:30,610
So next we do an Alice to see what we now have in our working directory.

18
00:01:30,740 --> 00:01:39,420
Next I load in the data using panties and of course the data is a CSP so the first column is the user

19
00:01:39,420 --> 00:01:39,840
I.D..

20
00:01:40,080 --> 00:01:41,980
The second column is the movie Eddie.

21
00:01:42,210 --> 00:01:46,000
The third column is the rating and the fourth column is the timestamp.

22
00:01:46,110 --> 00:01:51,690
So if you wanted to make some kind of really complex recommender system that maybe recommends the right

23
00:01:51,690 --> 00:01:56,460
thing at the right time you could take into account the timestamp or the order of the events as they

24
00:01:56,460 --> 00:01:57,290
occur.

25
00:01:57,420 --> 00:02:00,980
But that's not really a thing that people usually teach in recommender systems.

26
00:02:00,990 --> 00:02:02,370
So we're not going to discuss it.

27
00:02:09,410 --> 00:02:15,620
Next we recognize that although the dataset already uses integers for the use righties and movie ideas

28
00:02:15,980 --> 00:02:18,670
they might not be in the format that we want.

29
00:02:18,680 --> 00:02:25,190
Remember what these integers are eventually used for their use to index and embedding so user I.D. 0

30
00:02:25,190 --> 00:02:30,220
will index the first row is user right one will index the second row and so on.

31
00:02:30,260 --> 00:02:32,650
So think of a really not ideal scenario.

32
00:02:33,140 --> 00:02:39,200
Let's say you have one hundred users but these one hundred users have the ideas 1 million 1 million

33
00:02:39,200 --> 00:02:42,260
and 1 1 million and 2 and so forth.

34
00:02:42,290 --> 00:02:49,250
Well what happens is that since each IP must be an index into an array that array must have size 1 million

35
00:02:49,250 --> 00:02:55,100
and one hundred that's about 1 million rows of wasted space space that you won't be using.

36
00:02:55,490 --> 00:03:01,790
Therefore you would like your user I.D. to be numbered from zero up to and minus one.

37
00:03:01,790 --> 00:03:05,830
The thing is we can't just trust that the data is formatted in this way.

38
00:03:05,840 --> 00:03:10,520
In fact I would recommend checking to verify this for yourself as an exercise.

39
00:03:10,760 --> 00:03:13,330
In any case that's what this next block of code does.

40
00:03:15,140 --> 00:03:20,000
So I've commented out some code that you might use if you wanted to implement an index assignment from

41
00:03:20,000 --> 00:03:22,850
first principles using the apply function.

42
00:03:23,030 --> 00:03:28,010
Of course if you read the documentation for long enough eventually you'd stumble upon this useful thing

43
00:03:28,010 --> 00:03:34,130
where you can just cast the entire column to be a PD dot categorical object which automatically assigns

44
00:03:34,160 --> 00:03:36,560
integer ideas starting from zero.

45
00:03:36,800 --> 00:03:44,380
Then we can grab those ideas by calling dot cat dot codes.

46
00:03:44,890 --> 00:03:47,630
So next it's the same story for the movie ideas.

47
00:03:47,840 --> 00:03:54,050
Again as an exercise you want to verify that the original ideas were or were not in the format.

48
00:03:54,050 --> 00:03:54,800
We would like them

49
00:03:59,040 --> 00:04:01,780
next since our data is currently in a data frame.

50
00:04:01,800 --> 00:04:07,320
We're going to split them into individual num pie arrays we can do that by calling dot values on the

51
00:04:07,320 --> 00:04:09,620
relevant columns.

52
00:04:09,660 --> 00:04:15,420
Also we're going to center the ratings data which is part of the standardization process.

53
00:04:15,420 --> 00:04:21,480
The reason I'm not also dividing by the standard deviation is because in recommender systems literature

54
00:04:21,930 --> 00:04:25,370
results are often reported in terms of the root means squared error.

55
00:04:25,500 --> 00:04:31,340
The RNC of course this metric is reported on the scale of the original ratings.

56
00:04:31,410 --> 00:04:38,370
So if we change the scale our results won't be directly comparable to other results.

57
00:04:38,530 --> 00:04:43,260
One alternative is just scale your predictions and targets back before calculating the metric.

58
00:04:43,330 --> 00:04:49,940
But I leave that up to you.

59
00:04:50,140 --> 00:04:54,570
Next we're going to find N the number of users and m the number of movies.

60
00:04:54,700 --> 00:05:03,020
We'll also set d of the embedding dimension.

61
00:05:03,190 --> 00:05:08,220
The next step is to define our custom neural network for recommenders in the constructor.

62
00:05:08,230 --> 00:05:13,600
We'll take in the number of users the number of movies the embedding dimension and the number of hitting

63
00:05:13,600 --> 00:05:15,730
units as an exercise.

64
00:05:15,730 --> 00:05:20,350
You could extend this to have multiple hidden layers inside the constructor.

65
00:05:20,350 --> 00:05:25,270
We create all the necessary modules we have to embedding and two linear layers

66
00:05:30,610 --> 00:05:31,600
in the forward function.

67
00:05:31,610 --> 00:05:36,280
We define how to get a prediction given some input user and some input movie.

68
00:05:36,550 --> 00:05:43,090
As usual it's important and useful to pay attention to the shapes the UI variable starts out as a num

69
00:05:43,090 --> 00:05:46,600
sample's length array as does the M variable.

70
00:05:46,600 --> 00:05:51,540
The first step is to pass the user I.D. in the movie ideas into their respective embedding.

71
00:05:51,940 --> 00:05:59,090
This gives us two things of size num samples by D next week in cabinet these along the 1 axis and we

72
00:05:59,090 --> 00:06:02,450
get back a thing of size and num samples by 2.

73
00:06:02,570 --> 00:06:07,490
After this it's just a regular neural net where we pass it through a dense layer to a real you and then

74
00:06:07,490 --> 00:06:08,540
a final dense layer

75
00:06:12,900 --> 00:06:16,110
next we create a device object.

76
00:06:16,150 --> 00:06:18,360
Next we instantiate our model and move it to the jeep.

77
00:06:18,380 --> 00:06:25,330
You X. We create the laws an atomizer and we shuffle the data.

78
00:06:25,350 --> 00:06:31,900
This is so that we effectively randomize the train to split speed convert the data to torch tensor is

79
00:06:35,760 --> 00:06:40,400
next we create tensor data set objects for both the train set and the test set.

80
00:06:40,410 --> 00:06:45,210
We defined the train set to be the first 80 percent of the shuffled data and the test set to be the

81
00:06:45,210 --> 00:06:49,460
last 20 percent.

82
00:06:49,670 --> 00:06:52,710
Next we create data loader objects from our data sets

83
00:06:56,150 --> 00:06:56,770
after this.

84
00:06:56,780 --> 00:06:58,670
Things are more or less the same as usual.

85
00:06:58,670 --> 00:07:02,560
We create our training function which by the way has to move the data to the jeep.

86
00:07:02,570 --> 00:07:15,530
You manually now previously the special vision and text libraries were doing that for us.

87
00:07:15,550 --> 00:07:17,800
Next we run the training function.

88
00:07:17,800 --> 00:07:21,370
You'll notice that I've got this special P run magic command.

89
00:07:21,370 --> 00:07:25,970
This is a profiler that tells us things like how much time we spent any function.

90
00:07:26,260 --> 00:07:28,720
So why did I want to do this.

91
00:07:28,720 --> 00:07:33,150
Well you'll notice that each epoch took quite a long time over five minutes.

92
00:07:33,160 --> 00:07:34,600
I thought this was very strange.

93
00:07:44,340 --> 00:07:49,770
And so if you look at the outputs from the profiler you can see that a majority of the timespan isn't

94
00:07:49,770 --> 00:07:53,190
coming from our for loops or the model since our model is quite small.

95
00:07:54,060 --> 00:07:57,380
It's actually coming from the PI torch data set functions.

96
00:07:57,420 --> 00:08:02,700
So while you might assume that if you're using pi torch you should use pi torches data loaders.

97
00:08:02,700 --> 00:08:04,350
Maybe that's not always the best idea.

98
00:08:06,060 --> 00:08:11,310
Sure you could try to figure out if maybe there's some arguments we could have set to improve the performance

99
00:08:11,310 --> 00:08:16,860
or something like that but for a basic data set like this you shouldn't have to it should just work

100
00:08:16,890 --> 00:08:20,550
which is the whole point of built in functions in the first place.

101
00:08:20,610 --> 00:08:22,860
In any case we'll discuss that more in the next lecture

102
00:08:31,040 --> 00:08:33,470
so the final step is to plug the lost per iteration

103
00:08:36,200 --> 00:08:38,590
and that shows us that the model has converged.

104
00:08:39,930 --> 00:08:46,800
Although you should pay attention to the final MSE here because we are going to improve this value in

105
00:08:46,800 --> 00:08:47,520
the next lecture.