WEBVTT

00:04.280 --> 00:06.600
Airflow is made up of multiple components.

00:07.040 --> 00:11.120
This diagram here shows the main components that make up airflow.

00:11.520 --> 00:17.600
Let's start by first going over the airflow scheduler, which is one of the more important components.

00:17.840 --> 00:22.280
The scheduler determines what tasks need to be run and when they need to be run.

00:22.520 --> 00:28.400
Within the scheduler, there is also the executor, which determines where and how tasks are run.

00:29.120 --> 00:34.880
There are a handful of executors in airflow, with the more known being the local sequential celery

00:34.880 --> 00:36.600
and Kubernetes executors.

00:36.960 --> 00:41.720
They have different mechanisms of how they work and are used in different environments.

00:42.120 --> 00:49.320
For example, the sequential executor deals with simple single task executions while the executor runs

00:49.320 --> 00:51.080
parallel tasks on a single machine.

00:52.000 --> 00:56.520
So these two executors, we would use more for testing or development purposes.

00:57.480 --> 01:02.800
Then there are the executors, like the celery or Kubernetes, that would be better suited for production

01:02.800 --> 01:06.120
scenarios where distributed tasks are common.

01:06.280 --> 01:11.520
In our case, we will stick with the default executor of the official airflow docker compose file,

01:11.880 --> 01:13.560
which is the executor.

01:13.600 --> 01:15.760
We will go over this docker compose file.

01:15.760 --> 01:21.320
In a later lecture of this section, the executor will then direct the workers to execute the tasks.

01:21.720 --> 01:28.480
In our case, we will only define one worker, but we're in a production environment where you are working

01:28.480 --> 01:29.240
at scale.

01:29.240 --> 01:32.240
You will probably need to increase the number of workers.

01:32.680 --> 01:38.360
The airflow web server is the airflow UI where we can monitor and execute our DAGs.

01:38.640 --> 01:44.080
In our case, this will be available at localhost on port 8080.

01:44.280 --> 01:49.760
As a quick side note, I just mentioned the word dag, which you can also see in the image.

01:51.200 --> 01:58.760
The word dag stands for directed acyclic graph, and we can split this three word definition as follows.

01:59.120 --> 02:02.960
It is directed because the airflow tasks have a specific order.

02:03.600 --> 02:08.160
Acyclic because there are no cycles, meaning a task cannot depend on itself.

02:08.720 --> 02:13.800
And the graph refers to the fact that you can visually represent the tasks and their dependencies.

02:14.080 --> 02:18.960
We will see how to build an airflow Dag in the coming lectures, but for now, let's come back to the

02:18.960 --> 02:20.200
airflow components.

02:20.320 --> 02:23.280
In this diagram you also see the metadata database.

02:23.280 --> 02:30.490
This is where all the DAGs, code, scheduler, metadata, logs, variables, connections and much more.

02:30.530 --> 02:31.210
Are stored.

02:31.210 --> 02:37.970
Usually this is a Postgres database, but it can also be MySQL or for development purposes, even SQL

02:37.970 --> 02:38.330
Lite.

02:38.370 --> 02:44.290
We have gone over the core components, but there are some other, let's call them situational components

02:44.530 --> 02:45.730
that we should discuss.

02:45.770 --> 02:47.890
The first one being the airflow trigger.

02:48.010 --> 02:51.090
The airflow trigger supports the variable operators.

02:51.610 --> 02:57.490
An example of when this component would be used is if you have a sensor that waits for a certain file

02:57.530 --> 03:03.810
to be uploaded to an S3 bucket, the trigger monitor is the S3 bucket until the file is uploaded.

03:04.170 --> 03:09.970
Once the file is detected, the trigger signals the scheduler to resume any processing task post landing

03:09.970 --> 03:10.890
in the bucket.

03:10.890 --> 03:16.130
In our case, we won't be using the variable operators, so in the docker compose we will switch it

03:16.130 --> 03:16.410
off.

03:16.610 --> 03:22.010
Another situational component, which in our case we will use is a message broker.

03:22.410 --> 03:23.490
By default this is.

03:23.490 --> 03:28.090
Read this and you can see it here in this queue part of the diagram.

03:28.090 --> 03:32.090
This is needed by the executor which is the one we will use.

03:32.090 --> 03:35.650
And its role is to forward messages from the scheduler to the worker.

03:35.730 --> 03:37.610
That's it in terms of airflow architecture.

03:37.850 --> 03:41.090
In the next lecture we will go over the airflow directories.
