WEBVTT

00:04.280 --> 00:08.880
The purpose of this lecture is to answer the question how can I view the data in my data warehouse?

00:08.920 --> 00:12.120
In our case, there are two ways that this can be done.

00:12.160 --> 00:18.880
Can either go inside the Postgres container and use SQL, or you can set up connection using a database

00:18.880 --> 00:25.440
management tool like Dbeaver or Datagrid, which provides a more user friendly UI over the SQL syntax.

00:25.480 --> 00:27.280
We'll go over both in this section.

00:27.280 --> 00:33.040
So starting off with the SQL to access the Docker container, you can run the following command, which

00:33.040 --> 00:34.920
we have seen in previous lectures.

00:35.440 --> 00:39.440
Docker exec t going inside the Postgres container.

00:39.680 --> 00:45.400
And now we can start writing the actual SQL commands to connect to the Postgres instance.

00:45.400 --> 00:51.240
So we can write psql minus u which defines the username.

00:51.400 --> 00:56.720
In this case we have our username is YouTube underscore API underscore user.

00:57.560 --> 01:02.000
And also we specify the database which is ellt underscore db.

01:02.040 --> 01:09.370
This as you know we can find in the dot env These two should agree with these two over here.

01:09.410 --> 01:10.330
Let's click enter.

01:10.330 --> 01:14.690
And here we have the interface with the actual database.

01:14.770 --> 01:16.730
L't underscore db.

01:16.770 --> 01:23.690
If we were to run a simple SQL command to get all the data from the course schema inside the YouTube

01:23.690 --> 01:32.050
API table, we would write this simple SQL, press enter and as you can see the data is here.

01:32.050 --> 01:35.490
And you can continue to interact with the table using SQL CLI.

01:35.530 --> 01:38.130
Just as we saw with the Select statement.

01:38.130 --> 01:41.250
If you want to exit the statement, just press Q.

01:41.610 --> 01:43.810
Some other psql commands.

01:43.850 --> 01:47.410
This is good to know r backslash view.

01:48.010 --> 01:49.690
So this gives users.

01:49.690 --> 01:54.650
For this connection I'll remind you that we are connected as the YouTube API user.

01:54.890 --> 01:56.810
There is backslash L.

01:57.450 --> 01:59.570
This gives a list of databases.

01:59.810 --> 02:07.250
Here you can see the metadata backend database, the salary and the ones we are connected to right now.

02:07.570 --> 02:09.450
The rest are gated by default.

02:10.210 --> 02:12.690
Backslash n gives the schemas.

02:12.850 --> 02:16.010
Here we see the core and the staging schemas.

02:16.210 --> 02:18.810
And there's also default public one.

02:25.090 --> 02:27.250
So this took a while to actually write.

02:27.250 --> 02:33.490
What I wanted to show here is that we can find all the tables under schema by writing the backslash

02:33.530 --> 02:36.290
TW command and then specifying the schema in question.

02:36.290 --> 02:39.130
So here we want to have the YouTube API table.

02:39.170 --> 02:41.930
Obviously there are other SQL commands that you can use here.

02:41.930 --> 02:44.130
We just went over the more common ones.

02:44.130 --> 02:46.450
You can research other commands in your own time.

02:46.490 --> 02:51.970
On a final note, if you want to exit from this database connection, just write backslash q.

02:52.330 --> 02:56.330
Now that we have gone over the CLI, let's direct our attention to the UI.

02:56.450 --> 02:59.330
For this we need to install an SQL client.

02:59.530 --> 03:01.810
And in my case I use Dbeaver.

03:01.850 --> 03:06.610
I will leave a link in the appendix of this section where you can install the beaver, and you can go

03:06.610 --> 03:11.010
right ahead and install Dbeaver community, which is the open source version.

03:11.050 --> 03:13.570
Once you have installed it, you have this UI.

03:13.770 --> 03:17.700
You simply need to go to database need database connection.

03:17.700 --> 03:23.820
Since we have and we are using Postgres on Postgres, and from here we can start filling in the information

03:23.820 --> 03:26.100
needed for host name or host.

03:26.100 --> 03:32.660
In our case, since we have the container running on our local machine, we can set host to localhost

03:32.700 --> 03:33.540
the ports.

03:33.580 --> 03:35.860
We can leave it as a default of five volts.

03:36.340 --> 03:42.620
This is the port on our laptop that maps to the container's ports, which we had set so default of 5432

03:42.660 --> 03:43.220
as well.

03:43.260 --> 03:48.580
Username and password which you see here we can get from the EMV.

03:48.740 --> 03:50.300
So I will get those right now.

03:58.660 --> 04:05.500
We can also press the show all databases which will give all databases for this connection.

04:06.140 --> 04:13.580
You can also change this to LTE underscore DB which will give us the database that contains the LTE

04:13.780 --> 04:14.540
data.

04:14.540 --> 04:21.300
So make sure that you have inputted all of these credentials correctly and you can press the test connection

04:21.300 --> 04:25.820
button so you can see that we have successfully connected.

04:26.300 --> 04:27.420
That's a very good sign.

04:28.100 --> 04:29.500
And we can press finish.

04:29.860 --> 04:32.460
Here we have the connection that we just set up.

04:32.580 --> 04:38.900
And if we open it up you will find that we will have the ELT database that we just connected to, but

04:38.900 --> 04:45.100
also the other databases for the airflow metadata database and the salary as well.

04:45.140 --> 04:52.260
If you want to change the connection name, you can simply go on the connection edit connection and

04:52.260 --> 04:53.220
go on general.

04:53.700 --> 05:03.060
Here you can change this connection to something like Airflow connection YouTube.

05:03.420 --> 05:05.860
And you can also change the connection type.

05:06.540 --> 05:08.820
Let's change this to test for example.

05:09.620 --> 05:12.900
I will press okay and we will reconnect again.

05:16.740 --> 05:23.340
Now in terms of exploring if we have to go inside the database, we will look and find the schemas that

05:23.340 --> 05:25.350
we built in the previous lectures.

05:25.790 --> 05:30.750
If we were to go, for example, inside core, we will find the YouTube API table.

05:31.310 --> 05:33.350
And same goes for staging.

05:34.070 --> 05:41.350
Now, if we wanted to do the same SQL that we did in the SQL, using the CLI we can do is simply right

05:41.350 --> 05:46.870
click on the database connection SQL editor and open SQL script.

05:46.870 --> 05:49.350
From here we can run the same command that we ran before.

05:49.790 --> 05:53.870
So select all from core.

05:54.830 --> 05:56.070
Let's use your API.

05:58.350 --> 05:59.190
Let's press run.

06:00.030 --> 06:04.110
And here you will see the same data that we saw on SQL.

06:04.510 --> 06:10.870
And this is all the videos that we have up until today's date that we have run the ELT for.

06:12.430 --> 06:20.990
Feel free to also run different queries to for example, filter for video types that are only of the

06:20.990 --> 06:22.070
type shorts.

06:22.070 --> 06:29.350
So this you can simply just change to where video type equals shorts.

06:29.350 --> 06:29.440
Shirts.

06:31.720 --> 06:34.720
As you can see, all of the video types are shirts.

06:34.720 --> 06:40.640
And here, if we had to order the sending the maximum value, the duration should be of one minute,

06:41.040 --> 06:45.280
and it is in fact the case you can continue to explore in your own time.

06:45.760 --> 06:51.200
You can also explore the other databases that we have, and you will see the associated tables that

06:51.200 --> 06:52.240
are present here.

06:52.280 --> 06:57.160
If you go, for example, under the table you can see the columns.

06:57.200 --> 07:04.800
This is all auto generated by airflow and you can find the guides, the if it's active, if the Dag

07:04.800 --> 07:10.680
is paused and all of these other parameters in terms of seeing the actual data, we will probably get

07:10.720 --> 07:11.800
a permission denied.

07:11.840 --> 07:19.400
As you can see here, because we need to reconnect using the airflow metadata database credentials that

07:19.400 --> 07:21.800
we have for the specific connection.

07:24.480 --> 07:32.480
So in reality, what we can do here is to change this connection name to only be for database and right

07:32.480 --> 07:37.800
the other two connections to the other two databases, and that is what I will be doing next.

07:39.400 --> 07:41.000
Let's remove this from here.

07:43.560 --> 07:48.800
So as you can see now I have each connection set up with the appropriate credentials.

07:48.840 --> 07:51.960
And now we will have to repeat the same exercise as before.

07:52.120 --> 07:57.960
Going inside the public schema and going inside the deck table, we should be able to see the actual

07:57.960 --> 08:00.040
contents of the table.

08:00.080 --> 08:06.760
And as I was explaining before, here you can see the data which we produced ourselves, and other data

08:07.400 --> 08:13.880
and parameters related to the particular diving question like the description when it is set to run,

08:14.320 --> 08:17.920
and many more parameters you can continue to explore in your own time.

08:17.920 --> 08:23.440
But this gives you a good starting point in terms of setting up connections and viewing data in the

08:23.440 --> 08:25.280
tables inside your databases.

08:26.080 --> 08:26.840
Congratulations!

08:26.840 --> 08:29.760
You made it to the end of Data Warehouse section.

08:29.760 --> 08:34.440
In the next section we will explore testing and we will start off with data quality tests.

08:34.720 --> 08:35.720
I will see you then.
