1
00:00:11,070 --> 00:00:16,800
OK, so in this lecture, I'm going to give you an explicit exercise prompt in preparation for the next

2
00:00:16,800 --> 00:00:17,340
lecture.

3
00:00:18,000 --> 00:00:23,370
Ideally, you will complete this exercise before watching the next lecture so that the next lecture

4
00:00:23,580 --> 00:00:26,310
will serve as the solution to the exercise.

5
00:00:26,850 --> 00:00:31,890
As always, you can feel free to look at the notebook to get the data set, but avoid looking at any

6
00:00:31,890 --> 00:00:34,260
code which will taint your solution.

7
00:00:35,340 --> 00:00:37,080
OK, so what's the exercise?

8
00:00:37,650 --> 00:00:40,230
The exercise is to build a recommendation system.

9
00:00:41,250 --> 00:00:47,220
We're going to download a database of movies which includes information such as keywords, genres,

10
00:00:47,220 --> 00:00:52,680
title synopsis, tagline, production companies, production countries and so forth.

11
00:00:53,100 --> 00:00:56,190
You can feel free to use any or all of this information.

12
00:01:00,810 --> 00:01:06,670
One key step in the exercise will be to combine all the information from a movie into a single string.

13
00:01:07,950 --> 00:01:14,610
As you recall, the TF IDF Vector Racer and Count Vector Riser expect a list of strings as input where

14
00:01:14,610 --> 00:01:16,200
each string is a document.

15
00:01:16,740 --> 00:01:20,610
So feel free to get creative and think about how you might accomplish this.

16
00:01:22,030 --> 00:01:27,670
Once you have your documents for each movie and you've transformed them using TF IDF, you're then going

17
00:01:27,670 --> 00:01:30,100
to use those vectors to build a recommender.

18
00:01:30,790 --> 00:01:34,060
Essentially, your recommendation engine should work like this.

19
00:01:34,870 --> 00:01:39,430
We're going to assume that our query will always be an existing movie in the database.

20
00:01:39,910 --> 00:01:44,110
For example, suppose that I have a user that likes the movie Scream three.

21
00:01:44,740 --> 00:01:50,130
I'd like to recommend other movies to this user based on the fact that they already like this movie.

22
00:01:51,730 --> 00:01:56,470
The next step would be to get the TF IDF representation for the movie screen three.

23
00:01:57,310 --> 00:02:03,460
Once we have that vector, we will then compute the similarity between that vector and every other vector

24
00:02:03,460 --> 00:02:04,480
in the database.

25
00:02:04,990 --> 00:02:07,420
This will give us a whole list of similarities.

26
00:02:08,020 --> 00:02:12,160
We will then sort by similarity and we will choose the closest matching movies.

27
00:02:12,760 --> 00:02:16,840
The result should be that we prints out the top five matches for each query.

28
00:02:17,590 --> 00:02:23,200
Once you have your function that does this, test it on several movies from different genres and confirms

29
00:02:23,200 --> 00:02:24,520
yourself that it works.

30
00:02:26,260 --> 00:02:28,720
OK, so hopefully the exercise makes sense.

31
00:02:29,140 --> 00:02:31,240
Good luck, and I'll see you in the next lecture.