WEBVTT

00:04.240 --> 00:09.600
Now that we have a better understanding of the airflow architecture in terms of components, let's shift

00:09.600 --> 00:12.160
our focus to the airflow directories.

00:12.200 --> 00:16.040
What we will do with Docker is create a dark code on our laptops.

00:16.440 --> 00:22.840
Then we will use what are known as Docker volumes to mount the files from our local directories to the

00:22.840 --> 00:24.600
paths inside the containers.

00:24.760 --> 00:30.800
Once we have these volumes set up, any changes we make in our local directories will be reflected in

00:30.800 --> 00:32.600
the airflow Docker containers.

00:32.760 --> 00:38.200
There are four common directories in airflow, with the two most important ones being the DAGs and the

00:38.200 --> 00:39.320
logs folders.

00:39.680 --> 00:45.000
Let me just get a notepad and write these folders names down so we are all on the same page.

00:47.920 --> 00:54.280
So I mentioned the DAGs folders which we will use heavily, and also the logs in the Dag folder.

00:54.280 --> 01:01.120
The Dag Python code will be written while the logs folder will contain logs from task executions and

01:01.120 --> 01:01.880
the scheduler.

01:02.240 --> 01:05.000
The other two directories are the config.

01:05.360 --> 01:12.640
So let me write this down, which can be used to customize airflow, custom configuration and the plugins

01:12.640 --> 01:19.170
folder where you can put your custom plugins such as custom operators, sensors, hooks or other airflow

01:19.170 --> 01:20.010
components.

01:20.890 --> 01:27.090
Now in our case, we'll use the DAGs and Logs folders while the config and plugins we won't need to

01:27.090 --> 01:27.570
use.

01:28.130 --> 01:31.130
However, we will need to create some other directories.

01:31.410 --> 01:37.250
One of these directories is a tests folder where we will store the code for the functional tests.

01:37.490 --> 01:43.210
The data directory which we have already created will contain the data files, which we save in JSON

01:43.210 --> 01:48.330
in the extract part of our pipeline, and we have already seen this file being generated in previous

01:48.330 --> 01:48.890
section.

01:49.090 --> 01:52.250
And finally we will also have an include folder.

01:55.010 --> 01:59.970
Which is normally used for any additional files or resources that your DAGs might need.

02:00.010 --> 02:06.810
These are usually SQL scripts, Jar files, SSL certificates, or any other resource, but in our case

02:06.810 --> 02:11.090
they would be used for YAML files relating to the data quality tests.

02:11.450 --> 02:15.850
We will discuss this YAML file when we get to the data quality section of the course.

02:16.050 --> 02:21.650
And with that, we have covered the main theoretical aspects for the airflow architecture and directories.

02:21.770 --> 02:24.890
In the next lecture we will touch again on the env file.

02:25.090 --> 02:26.010
I will see you then.
