WEBVTT

00:05.160 --> 00:05.740
Hi everyone.

00:05.740 --> 00:06.900
Welcome back.

00:06.910 --> 00:12.880
So in this video we're going to be starting to implement the API that we talked about the tech similarity

00:12.880 --> 00:17.470
API and for that I need a new folder here called Text similarity.

00:17.530 --> 00:24.630
So I'm going to go into the to the folder here so slash text similarity and we're basically going to

00:24.630 --> 00:30.310
shirk building this from scratch right so that any of the previous go and just to get used to the idea

00:30.310 --> 00:35.200
of developing an API from scratch and you don't have to reference or code.

00:35.470 --> 00:41.020
So the first thing is we're going to be using Blocher and our compose as usual cerned going to do touch

00:41.260 --> 00:43.610
darker compose dynamo.

00:43.720 --> 00:50.540
So on the or on the outer directory slash text similarity is do touch darker composed.

00:51.070 --> 00:54.460
And then around to also do the two directories that we have.

00:54.540 --> 00:59.480
So I'm going to do make there actually web and make directory D-B.

00:59.500 --> 01:02.350
So we have two new ones where we have the Web and the D-B.

01:02.350 --> 01:04.600
The Web is going to have the the API.

01:04.750 --> 01:09.880
And then is going to have as usual the Mungo D-B where we can store the information about the users

01:10.120 --> 01:11.630
and their passwords and so on.

01:11.710 --> 01:12.430
OK.

01:12.760 --> 01:13.390
All right.

01:13.510 --> 01:14.910
So a couple of things though.

01:14.950 --> 01:21.380
Let's first go into this first open to your right to atom.

01:21.580 --> 01:22.230
OK.

01:22.540 --> 01:27.150
So we're going to open Adam in this directory and let's fill in the doc or compose file.

01:27.160 --> 01:27.380
Right.

01:27.380 --> 01:30.400
It shouldn't be too hard to finish.

01:30.550 --> 01:33.720
So let's close these welcoming guides.

01:33.860 --> 01:39.790
Make sure they don't they don't show us again and yet.

01:39.860 --> 01:41.130
So that's close to that.

01:41.300 --> 01:45.890
And then we go to the doctor gumbos file and he's going to be very similar to the previous ones.

01:45.890 --> 01:50.160
We're going to write version and then three.

01:50.190 --> 01:51.690
So our version history.

01:51.860 --> 01:57.830
And then we're going to have services inside these services or we're going to have one cold web and

01:57.830 --> 02:03.370
I can build it so I'm going to build from dot slash Schweppes.

02:03.370 --> 02:08.580
So from this directory over here and I'm going I also opens up it's

02:11.850 --> 02:14.040
and then it's going to be a list.

02:15.480 --> 02:22.740
So it's going to be a list of five thousand to five that's I'm going to map the ports 5000 to five that

02:23.370 --> 02:25.130
I'm also going to do some links.

02:25.330 --> 02:32.140
So because I need to depend on another another one so I'm going to do lent D-B.

02:32.610 --> 02:36.070
So it's going to basically depend on this D-B over here.

02:36.120 --> 02:40.010
So that depend on this service service here.

02:40.260 --> 02:48.340
So for this one we're going to build it right from as you might have guessed dot slash D-B.

02:48.420 --> 02:51.480
We're going to have a docker file that we should build for.

02:51.960 --> 02:56.970
Okay so that's it for the for the or composer very similar to the previous one that we built in previous

02:57.090 --> 02:58.460
years now.

02:58.980 --> 03:04.920
Now the second thing we want to do is to do the doc files right.

03:05.020 --> 03:10.770
So let's go into to do the doc or file for D-B which is much easier right.

03:10.810 --> 03:11.180
Very.

03:11.190 --> 03:19.010
Only one line of code where we get the Mongo D-B from the the doctor that we discussed previously.

03:19.030 --> 03:24.880
So I'm going to go into D-B and then I'm going to touch in your file called Duck duck or file a case

03:24.920 --> 03:29.900
Doctor file just like this and then I'll see the doctor file is now showing yours.

03:29.890 --> 03:35.590
I'll open it and then it's only a one line or even a write from Mungo.

03:35.590 --> 03:38.750
And then our version was 326 point four.

03:38.770 --> 03:41.390
Now if at that time of this video there's a newer version.

03:41.380 --> 03:43.630
Feel free to put in your version here.

03:43.630 --> 03:46.360
Shouldn't be too different in terms of the API.

03:47.250 --> 03:52.830
All right so we've finished the doctor file for this one over the database.

03:52.860 --> 04:00.280
So the next thing is to do the date the doctor file for it the Web site or the API.

04:00.440 --> 04:04.780
So I'm going to go into web SOS.

04:04.780 --> 04:13.240
So go back first and then go into web site and then when I get in touch logger fire file make sure the

04:13.300 --> 04:17.820
pronunciation is and the spelling is correct.

04:18.160 --> 04:20.340
So now we're going to see that in Web.

04:20.380 --> 04:22.370
We have a new doc file here.

04:22.720 --> 04:27.580
So the doc files here and we're going to write this one is going to be a little bit longer and we're

04:27.580 --> 04:35.770
going to add an additional thing in this in this API that we haven't added in previous ones.

04:35.770 --> 04:43.390
So if you remember when we were discussing here so let's go back to this chart protocol we were discussing

04:43.390 --> 04:45.560
that we're going to be doing detection right.

04:45.580 --> 04:51.310
The similarity detection between two documents but we never really discussed how we're going to compare

04:51.310 --> 04:53.000
these two documents together.

04:53.200 --> 05:00.230
And the answer is actually we're going to be using natural language processing library code Spacey netspace

05:00.270 --> 05:07.810
is a very easy and natural language processing model that has already been tried it has a lot of models

05:07.960 --> 05:12.080
already trained and are going to show you how to use it it's very very simple to use.

05:12.220 --> 05:14.310
The API is very simple to use.

05:14.530 --> 05:20.490
And so we're basically going to be comparing the two strings using a new library called Spacey.

05:20.490 --> 05:24.390
I'm going to show you how to use this library in just a few minutes.

05:24.400 --> 05:26.580
Let's go back to our virtual machine.

05:27.070 --> 05:30.580
So the first thing is again we need Python right the first thing.

05:30.580 --> 05:32.010
This is a python API.

05:32.020 --> 05:35.970
So we're going to Python through that.

05:35.990 --> 05:39.270
The second thing is I'm going to choose the work directory as we always do.

05:39.300 --> 05:42.080
I'm going to go to slash slash slash app.

05:42.180 --> 05:45.810
Gates We always do that or it's a convention.

05:45.940 --> 05:49.860
Next we're going to copy requirements.

05:50.030 --> 05:54.480
Text but we don't already have requirements that text.

05:54.490 --> 05:56.940
So that's what we're going to be doing next.

05:57.080 --> 06:00.730
So we have to remember that we still don't have the requirements on texts.

06:00.770 --> 06:09.040
So for now we don't have it but we're going to be making it in a couple of seconds in this directory.

06:09.050 --> 06:15.930
So recopying requires that text into the work directory which is less user slashed or slash.

06:16.490 --> 06:20.110
And then what I'm going to do is I'm going to do a run.

06:20.630 --> 06:25.560
So run it install and then I'm going to use no cash.

06:25.790 --> 06:26.470
Right.

06:26.480 --> 06:34.760
Make sure there's no cache directory and then recursive and then requirements not text.

06:35.150 --> 06:41.510
Ok so I'm going to do this install no cache directory and then recursive and then requirements that

06:41.510 --> 06:47.040
exit the file that I want to install next I'm going to do the copy.

06:47.160 --> 06:52.590
To got someone to copy this entire file over here into Sless use or slash or slash app.

06:53.000 --> 06:54.050
And then the fine.

06:54.080 --> 07:00.290
Well now the final thing usually before if you remember we we do a basically command and then Python

07:01.830 --> 07:04.460
and then app up.

07:04.500 --> 07:05.370
Right.

07:05.370 --> 07:07.310
So for now this is perfectly fine.

07:07.350 --> 07:07.870
Right.

07:08.010 --> 07:12.610
But the thing is Stacee already has some models right.

07:12.720 --> 07:16.550
So what we're going to be doing is basically install that.

07:16.560 --> 07:22.860
So if I go to the desktop here you'll see that install one of the models that Stacy was already trained

07:22.860 --> 07:23.390
on.

07:23.400 --> 07:30.210
So usually in machine learning or data science you have a model and you train it right.

07:30.210 --> 07:33.650
And this training process takes a lot of computation power.

07:33.720 --> 07:37.450
You need a strong GP you to do all these computations.

07:37.530 --> 07:42.590
But for example the library's PC already trained our model.

07:42.600 --> 07:49.080
So this is the model over here it's called English core web S M and then 2.0 point zero.

07:49.080 --> 07:57.080
So this is a library here you can download it from the Web site off of species or if I go to your to

07:58.250 --> 08:04.690
Firefox and then I write to speccy download and then I'm going to write.

08:05.010 --> 08:13.870
So you're Spacey and then download and then models and then click enter.

08:14.120 --> 08:19.070
Someone I find here models and languages you can see are already downloaded from here.

08:19.190 --> 08:26.540
So if you click here and then you're going to have a bunch of available models right so this one here

08:27.230 --> 08:29.050
and core web as.

08:29.090 --> 08:34.370
So this is the one that I downloaded and you can download it from here using Piscean bib or you can

08:34.370 --> 08:37.070
download it right away from the get help repository.

08:37.070 --> 08:37.560
Right.

08:37.580 --> 08:45.490
So if you click here and then basically this is the library that so if you look here it's only 35 megabytes.

08:45.590 --> 08:47.250
And this is the smallest model.

08:47.270 --> 08:47.770
Right.

08:47.960 --> 08:55.300
And if you want to download it right away to your device then you can go to get help repositories or

08:55.300 --> 08:59.280
you can write get help and then speccy models.

08:59.390 --> 09:01.320
Spacey dash models.

09:01.790 --> 09:06.330
And then you click here on the Gitau blink.

09:06.680 --> 09:12.480
And then you to go for releases so you want you want to download their latest releases.

09:12.560 --> 09:14.000
So you're going to

09:17.720 --> 09:27.410
and then you're going to go to slash releases slash download and then slash the model name that you

09:27.650 --> 09:28.170
want.

09:28.250 --> 09:34.430
So in this case what I downloaded is the smallest model of them which is eat an underscore core underscore

09:34.430 --> 09:42.300
web underscore s.m so I'm going to write an underscore core underscore Web so you go to space the models

09:43.050 --> 09:49.790
get to the top so get dot com slash explosions last speccy models and then slash releases slash and

09:49.790 --> 09:57.250
download slash and underscore core underscore web underscore s m Dasht 2.0 point zero and then slash

09:57.720 --> 10:05.190
and underscore core underscore where underscore and Dasht 2.0 point zero Tarbuck GC and he should find

10:05.190 --> 10:11.910
it on the page and then if you click enter then it should automatically show you the link.

10:11.910 --> 10:13.580
So here is the file.

10:13.680 --> 10:17.620
So this is the underscore core underscore web underscore as.

10:17.790 --> 10:22.410
And this is 3:35 when six megabytes and this is exactly this file over here.

10:22.410 --> 10:25.150
So you're just going to basically save the file.

10:25.200 --> 10:29.670
But for now since I already downloaded it I will do that.

10:29.840 --> 10:30.100
OK.

10:30.110 --> 10:31.700
So why are we discussing this.

10:31.730 --> 10:38.060
So these models are already pre-trained and they're good for as I say you can use and right away to

10:38.130 --> 10:43.980
to use or to predict some whether the similarity of two documents is similar or not.

10:44.150 --> 10:49.670
So what we're going to do is we're going to take this package over here and get a copy it and then going

10:49.670 --> 10:51.650
to go into tech similarity.

10:51.710 --> 10:55.610
So this is our new API.

10:55.790 --> 10:57.890
And then I'm going to go into web.

10:58.100 --> 11:02.400
And then I'm going to paste this model here because I'm going to be using this more frequently.

11:02.410 --> 11:02.980
Right.

11:03.290 --> 11:05.740
OK so now let's go back here.

11:05.840 --> 11:13.490
Now I want to say that I want to download or to to to pick install So this is basically if you go back

11:13.490 --> 11:17.720
here this model is just up is either is a Python package.

11:17.720 --> 11:19.450
So you want to install it.

11:19.490 --> 11:20.500
So I go piss.

11:20.520 --> 11:27.140
So I'm going to go back and do my doctor file here and then I'm going to write Pipp that first of all

11:27.140 --> 11:35.840
run to run a command and then get installed and then I'm going to go to the rectory as my model which

11:35.840 --> 11:42.320
is in this directory here so I'm going to do a dot slash event underscore core underscore web underscore

11:42.320 --> 11:54.050
as and then dash 2.0 or point zero and then that TAR that GZ and then that's it.

11:54.050 --> 11:57.710
So I basically told Pipp to install this library over here.

11:58.560 --> 12:05.370
Now there are other ways that you can go get it from the Speccy Web site here to download this model

12:05.370 --> 12:11.850
right away using pick rice or Pipp install space something so there is a command here which is.

12:12.090 --> 12:13.600
So I'll show it here.

12:13.650 --> 12:21.410
Bison dash cam Spacey download and then space and underscore core underscore where underscore grass

12:21.420 --> 12:27.890
and this automatically downloads the the library for you and sets everything right.

12:28.020 --> 12:33.960
But for me I'd prefer it to have my model off line not online because the website might break at any

12:33.960 --> 12:34.570
point.

12:34.740 --> 12:39.450
So I prefer to have the model locally of elbow's shouldn't have open that.

12:39.450 --> 12:46.530
So I prefer to have the model available at all times so that you don't have to open it or download it

12:46.560 --> 12:51.780
before you want to run this program right because the Web site might be broken so it's better to keep

12:51.840 --> 12:52.890
everything locally.

12:52.890 --> 12:55.260
Even the models you're going to be using.

12:55.830 --> 12:57.600
OK so this is the docker file.

12:57.600 --> 12:57.830
Right.

12:57.830 --> 13:03.440
This is what we're going to be using the model that we're going to be using with Spacey to predict whether

13:03.510 --> 13:06.150
two documents are similar or not.

13:06.180 --> 13:12.220
Now this if you remember I said a few seconds ago that we still need to do the requirements that X-Fi

13:12.540 --> 13:13.410
So let's do that.

13:13.410 --> 13:20.700
So we've got to go touch requirements that text and then we have that here.

13:21.240 --> 13:26.460
And then we have a couple of requirements so first of all we need of course Flast right as we always

13:26.460 --> 13:32.090
discuss with the flask restful right flask restful.

13:32.130 --> 13:41.340
We also need pi Mungo to communicate with the Mongo D-B we need be script because the script is used

13:41.340 --> 13:46.730
to store the passwords to hash the passwords and store them as hashed passwords.

13:47.100 --> 13:50.710
And finally we're going to need a new library called Spacey.

13:50.730 --> 13:53.170
So Spacey is Espey ace UI.

13:53.190 --> 13:56.560
So let me perhaps zoom in here.

13:56.610 --> 14:02.250
So Spacey and space is basically the library where we'll be using for natural language processing.

14:02.330 --> 14:03.080
OK.

14:03.630 --> 14:09.900
So are we going to save that and we're done with the requirements text and the overall architecture

14:09.900 --> 14:11.160
of the project.

14:11.160 --> 14:17.790
So we got to pause here and we're going to come back in a future in the future videos and start implementing

14:17.790 --> 14:18.870
the app up your way.

14:18.900 --> 14:24.020
Right so we have one last file we haven't done worse which is a touch at the Y.

14:24.060 --> 14:28.040
And this is where the main logic of our API is stored.

14:28.140 --> 14:31.490
If you remember everything here is in this API.

14:31.810 --> 14:33.380
It is in this A.

14:33.400 --> 14:34.910
After you whyfor.

14:35.430 --> 14:39.350
So yeah we're going to stop here and we're going to pick up in the next video.

14:39.420 --> 14:41.790
So until the next video had become.
