WEBVTT

00:04.240 --> 00:09.680
We have already gone over the dot env file in the first section of this course, and why we need it

00:09.680 --> 00:11.360
to handle sensitive variables.

00:11.600 --> 00:17.520
Apart from sensitive variables, we also discussed how we can also store non-sensitive variables in

00:17.560 --> 00:21.640
the env file that relate to specific environments setups.

00:21.960 --> 00:27.800
For example, a production scenario will have different configuration settings when compared to a development

00:27.800 --> 00:28.640
environment.

00:29.040 --> 00:32.840
So you will have one env for dev and one for prod.

00:33.680 --> 00:40.240
In this section we will build upon the env file we already have and use it to reference variables like

00:40.280 --> 00:44.880
database connection credentials, airflow parameters and some other variables.

00:45.280 --> 00:50.840
I have already made the necessary changes to the env file, as you can see here.

00:51.480 --> 00:55.880
The DMV you see here is also available in the previous slide document.

00:56.360 --> 01:01.840
Let's go over these environment variables and also tell you which ones you need to change and which

01:01.840 --> 01:02.640
you can keep.

01:02.680 --> 01:10.530
As you see here in terms of the environment variable definition names like Docker Hub namespace repository

01:10.850 --> 01:12.130
PostgreSQL username.

01:12.290 --> 01:19.410
It is imperative you don't change any of these names, as they will be referenced by other files like

01:19.410 --> 01:20.370
Docker compose.

01:20.650 --> 01:26.930
What you can and in some instances should change are the values of these variables, and we'll cover

01:26.930 --> 01:27.730
this shortly.

01:28.010 --> 01:33.210
The first set of variables relate to the Docker Hub credentials, which are these ones.

01:33.450 --> 01:40.450
The namespace is your Docker Hub username, so make sure you change it to your username.

01:40.490 --> 01:44.770
While Docker Hub repository is the name you gave to the repository.

01:44.930 --> 01:49.330
In my case, I gave it the name YouTube underscore API underscore erd.

01:49.930 --> 01:52.810
Moving on to database connection credentials.

01:53.450 --> 01:57.810
As we have already mentioned, we will use Postgres for all our databases.

01:58.410 --> 02:00.650
And in terms of databases we have three.

02:01.130 --> 02:07.570
These being the first one is the metadata backend which will store the data related to the metadata

02:07.610 --> 02:08.650
used by airflow.

02:08.970 --> 02:16.550
The second is the database that will store the results of tasks executed by the executor.

02:17.590 --> 02:25.110
And the final Postgres database is the database where we will store the data extracted from the YouTube

02:25.110 --> 02:25.590
API.

02:25.910 --> 02:33.190
So what you see here under Postgres connection is the connection parameters to Postgres instance that

02:33.190 --> 02:35.990
will contain the above three databases.

02:36.710 --> 02:45.750
And for this connection you will need the username, password, hostname and port which is 5432.

02:45.790 --> 02:47.150
The default Postgres.

02:47.350 --> 02:49.870
You can keep these values same as you see them here.

02:50.190 --> 02:53.390
This is especially important for the hostname for the host.

02:53.390 --> 02:59.030
It is important that you keep it as Postgres because the service name for the Postgres container will

02:59.030 --> 03:00.110
also be Postgres.

03:00.150 --> 03:05.550
In the docker compose file, if it's not, airflow will complain about host resolution for the three

03:05.550 --> 03:09.190
different databases, which is these configurations here.

03:09.590 --> 03:14.910
The variables specified are the database name, username and password.

03:14.950 --> 03:19.960
I would recommend you keep these the same as I have them, especially for database names.

03:20.120 --> 03:23.320
So it is clear which databases will store what.

03:23.360 --> 03:29.080
If you decide to change any of these credentials, for example, the one of the passwords of database

03:29.080 --> 03:35.880
connections, make sure that they do not contain special characters such as the dollar sign, the At

03:35.880 --> 03:38.320
symbol, or any other special character.

03:38.360 --> 03:43.800
When I was testing, I had a password that contained one of these special characters, and I was getting

03:43.800 --> 03:47.840
an issue with parsing when I integrated these credentials in the Docker compose.

03:48.040 --> 03:53.520
So again, feel free to use different passwords which you can create with, for example, online password

03:53.520 --> 03:57.000
generators, but be cautious of these possible issues.

03:57.040 --> 04:00.640
Again, I would recommend to use the same credentials that I have.

04:00.680 --> 04:07.960
The remaining airflow parameters are the airflow Uuid or the user ID, which we set it to the default

04:07.960 --> 04:09.000
of 50,000.

04:09.240 --> 04:13.800
This means that the airflow containers will run with the user ID 50,000.

04:13.840 --> 04:20.080
This step of adding the airflow Uuid into the env is also recommended to be done in the airflow documentation,

04:20.080 --> 04:26.090
to avoid a warning that might give you relating to this fluid variable not being set.

04:26.290 --> 04:32.410
The username and password you see here are the credentials you will use to access the Airflow web server

04:32.410 --> 04:32.930
UI.

04:32.970 --> 04:35.890
Feel free to keep these or change them as you wish.

04:35.930 --> 04:39.370
The final airflow parameter is the net gain for Net.

04:39.410 --> 04:45.170
Keys are used for encrypting and decrypting sensitive data stored in the metadata database.

04:45.490 --> 04:51.610
The airflow docs have a code snippet of how we can create a foreign key, as shown in the course slides.

04:51.650 --> 04:57.370
This is the piece of Python code that the airflow documentation recommends for creating a fairness key.

04:57.410 --> 05:03.290
So this would require us to install the cryptography library using Pip and is the approach you should

05:03.290 --> 05:04.930
take for production scenarios.

05:05.410 --> 05:10.730
Since we are not in a production scenario and the passwords we create are not intended to store sensitive

05:10.730 --> 05:14.610
information, there are other simpler ways of generating a key.

05:15.130 --> 05:18.490
One of them is using a Fernet Key generator website.

05:18.730 --> 05:24.770
In my case, I am using this website called Fernet Key Generator, which I will add to the appendix

05:24.770 --> 05:25.610
of this section.

05:25.730 --> 05:30.070
You can press Generate key and you can use this in your env.

05:30.430 --> 05:35.670
If for some reason at the time you're viewing this course, the website is not working, you can either

05:35.670 --> 05:37.790
use the Fernet Key in Miami.

05:37.830 --> 05:43.750
You can also try searching for the website that does the same thing, or as a last resort, created

05:43.750 --> 05:47.070
using the Python code as detailed in the docs.

05:47.470 --> 05:54.950
Finally, the last two parameters in the env are the API key, which we are very familiar with at this

05:54.950 --> 05:55.470
point.

05:55.990 --> 06:00.710
Make sure that you use your own API key and that you don't share it with anyone.

06:01.230 --> 06:05.950
My personal key won't work for you since I will have deactivated it by the time you are viewing this

06:05.950 --> 06:06.470
course.

06:06.510 --> 06:11.270
Here I also include the channel handle which we had seen was Mr. Beast.

06:11.590 --> 06:12.990
And that's it for this lecture.

06:13.430 --> 06:19.110
In these lectures, we have built the foundation that will help us understand how Docker compose will

06:19.110 --> 06:21.190
be structured for our airflow setup.

06:21.430 --> 06:27.110
I will push the docker file and requires a txt to GitHub, and I will see you in the next lecture where

06:27.110 --> 06:29.110
we go over the docker compose file.
