WEBVTT

00:04.280 --> 00:05.000
The Docker file.

00:05.000 --> 00:10.120
At its core is a script containing a series of instructions on how to build your custom Docker image.

00:10.800 --> 00:13.120
The first step is to actually create the Docker file.

00:15.720 --> 00:20.680
In our case, Docker image that we will build on is the airflow image, which you can find on Docker

00:20.680 --> 00:21.000
Hub.

00:22.480 --> 00:25.760
This is the official airflow image you see here.

00:26.880 --> 00:28.280
If we were to press on tags.

00:30.640 --> 00:35.440
We would see that there are many different tags depending on the airflow version and the type of Docker

00:35.440 --> 00:36.560
image we want to use.

00:37.040 --> 00:43.280
For example, this Docker image is using the Docker slim version with the Python 3.9 version.

00:43.280 --> 00:49.080
In our case, we will use airflow version 2.9.2, which is a more recent airflow version.

00:49.240 --> 00:56.080
And with that, we also want to specify the Python version we want to use since airflow is built using

00:56.080 --> 00:56.600
Python.

00:57.320 --> 01:02.680
In our case, we'll use a more recent version of Python as of recording of this lecture, which is Python

01:02.680 --> 01:03.480
3.10.

01:04.680 --> 01:07.610
If we were to write what we've said so far, we would have the following.

01:07.770 --> 01:13.810
So let's go back to VS code, and let's define the airflow in Python version as arguments.

01:14.570 --> 01:17.970
So first we define the airflow version argument.

01:19.330 --> 01:23.010
As we said we will use airflow 2.9.0.

01:24.490 --> 01:30.090
We also define the Python version, which we said will be Python 3.10.

01:31.010 --> 01:34.690
Now we can reference these arguments in the from part of the docker file.

01:35.890 --> 01:44.690
So from Apache Airflow this is the official airflow image we just saw on Docker Hub.

01:45.410 --> 01:48.530
And here we specify the airflow version.

01:51.370 --> 01:54.370
With also the Python.

01:57.370 --> 02:02.930
Next we will define the airflow home environment variable, which is the directory which will contain

02:02.930 --> 02:10.050
the main important airflow folders and files like the DAGs and logs folders and obviously the airflow

02:10.170 --> 02:10.970
config file.

02:11.010 --> 02:12.970
The default is as follows.

02:13.250 --> 02:22.730
So specify the environment variable airflow home and we set it to slash opt slash airflow.

02:22.770 --> 02:27.970
So notice how so far we have not defined anything that is custom in terms of the actual image we are

02:27.970 --> 02:28.490
building.

02:28.970 --> 02:34.610
We are still using the base image with airflow 2.9.2 and Python 3.10.

02:34.650 --> 02:41.130
Now at this stage we can define the requirements.txt, which will contain all the extra packages that

02:41.130 --> 02:42.610
we want the image to have.

02:42.850 --> 02:45.170
That doesn't come with the base airflow image.

02:45.330 --> 02:48.530
So let's define this text file as well.

02:48.570 --> 02:51.130
Requirements dot txt.

02:52.450 --> 02:55.610
In this text file we will list the libraries that we want to install.

02:55.770 --> 02:57.010
One such library is the.

02:57.010 --> 03:00.010
So the library that we will use for data quality tests.

03:00.130 --> 03:05.690
In later lectures, we will get more familiar with the requirements.txt at a later stage of the course.

03:05.930 --> 03:07.370
For now it will be empty.

03:08.090 --> 03:17.130
Just note that the command we will use in the docker file to follow the requirements is copying requirements

03:17.650 --> 03:19.290
dot txt.

03:19.970 --> 03:25.930
Here we are essentially saying to copy the requirements.txt file from our local directory to the root

03:25.930 --> 03:28.410
directory of the Docker images file system.

03:28.570 --> 03:34.690
Finally, we use the run command which installs the specified version of airflow and the packages in

03:34.690 --> 03:37.530
requirements.txt using pip install.

03:37.730 --> 03:40.290
So let's actually write the run commands.

03:40.530 --> 03:46.770
I'll write the full command and we can go over the specific parts of the run command.

03:46.930 --> 03:47.890
So no cash.

03:56.250 --> 03:58.450
So this is the full run command.

03:58.690 --> 04:05.530
And the no cash option ensures that Pip does not catch the packages, which helps keep the image size

04:05.530 --> 04:06.130
smaller.

04:07.010 --> 04:09.610
And with that we have the Dockerfile script.

04:10.050 --> 04:11.170
That's it for this lecture.

04:11.170 --> 04:14.610
In the next lecture I will show you how to actually build the image.

04:15.210 --> 04:16.090
I will see you then.
