WEBVTT

00:04.360 --> 00:05.840
Similar to the first job.

00:05.880 --> 00:10.520
The second job will only need to run if certain files within our codebase change.

00:10.640 --> 00:14.680
Here we will also introduce the concept of job dependencies.

00:15.200 --> 00:22.000
So let's follow the same structure as the first job and give this job an appropriate name such as the

00:22.000 --> 00:22.560
following.

00:23.880 --> 00:29.440
Also, same as in the previous job, we need to select the runner on which to run this job, which will

00:29.440 --> 00:31.880
also be the latest ubuntu image.

00:34.200 --> 00:35.480
So nothing new so far.

00:35.680 --> 00:42.720
But now we introduce the concept of dependencies, which in GitHub actions you specify using the needs

00:42.720 --> 00:43.440
command.

00:46.080 --> 00:48.800
We would like it before we run the tests.

00:48.800 --> 00:51.440
We have the latest updated Docker image.

00:51.600 --> 00:58.240
So we need to ensure that the first job, which is the build and push image, runs before this job,

00:58.240 --> 00:59.680
which we are defining right now.

00:59.800 --> 01:07.980
So we specify the needs as we said, and also to specify the job name so we can copy this from here

01:09.100 --> 01:10.140
and paste it here.

01:11.180 --> 01:12.300
As we have said already.

01:12.340 --> 01:17.780
The docker compose at the moment uses the env which we have saved locally.

01:18.220 --> 01:21.580
However, this is not committed to GitHub for obvious reasons.

01:21.580 --> 01:28.220
With this GitHub workflow, we need to find a way to specify the environment variables defined in env

01:28.380 --> 01:37.460
inside this YAML workflow file, and we do this by specifying the env parameter, which references the

01:37.500 --> 01:43.180
GitHub actions, secrets, and variables for all the variables in Docker compose.

01:43.220 --> 01:50.540
If I had to list out all of the variables and secrets that we have in the env, this will look as follows.

01:52.580 --> 01:56.220
By my count, I have 24 secrets and variables.

01:56.500 --> 02:02.660
Notice how sensitive variables are defined as secrets and non-sensitive as variables.

02:02.700 --> 02:09.350
These secrets and variables we specify in YAML file needs to be found in the GitHub actions as either

02:09.350 --> 02:10.750
secrets or variables.

02:10.750 --> 02:12.870
So we have to go back to GitHub actions.

02:12.910 --> 02:20.110
Make sure that the secrets and variables that we have just defined in YAML file align with what you

02:20.110 --> 02:24.190
have in the GitHub actions, secrets, and variables.

02:24.590 --> 02:31.310
What this means for the Docker compose is that we need to remove these env underscore file parameter

02:31.310 --> 02:36.870
references, since these variables will not be provided by env at this stage.

02:36.870 --> 02:42.190
So we can comment out the code in the airflow command and also in the Postgres section.

02:43.230 --> 02:45.550
This takes care of the airflow command.

02:46.110 --> 02:52.990
Now let's go down to the Postgres service and also comment it out for the Postgres container.

02:53.030 --> 02:58.510
An additional change that is needed is to include the following in the environment section.

02:58.510 --> 03:00.230
So let's uncomment these out.

03:01.270 --> 03:07.970
This is done since the database Initialization script needs these variables, and the GitHub actions

03:07.970 --> 03:14.570
doesn't automatically load environment variables into the compose, so we need to ensure the variables

03:14.570 --> 03:18.010
are parsed inside the Postgres service itself.

03:18.250 --> 03:22.490
So like that we have done the required changes in docker compose.

03:23.450 --> 03:30.410
Going back to the YAML file, as we introduce the job dependencies and environment variables, we can.

03:30.450 --> 03:36.570
Same as before, write the steps of the job starting with the code checkout from the repository.

03:38.770 --> 03:41.530
So let's go up and copy this code.

03:45.250 --> 03:49.930
Having checked out the code, we now define the files that will trigger this job.

03:49.970 --> 03:52.170
Should there be a change in them?

03:52.650 --> 03:59.130
And for this step, it makes sense that we define any file that is inside the airflow DAGs folder.

03:59.570 --> 04:03.410
The include folder which holds the configs for data quality tests.

04:03.650 --> 04:05.860
And finally the docker compose file.

04:06.220 --> 04:12.740
We don't need to define docker file and requirements.txt because before running this job, the build

04:12.740 --> 04:17.020
and push image job will always be triggered and run before this job.

04:17.060 --> 04:21.020
This is handled by the defined here for the needs.

04:22.660 --> 04:28.460
Putting this all together, we can simply copy and paste what we have here.

04:31.500 --> 04:37.780
We need to do a change for the ID since this needs to be unique in our case.

04:38.260 --> 04:41.820
This was for the build step, but now we are working with the tests.

04:41.820 --> 04:44.180
So let's change this to change files tests.

04:44.220 --> 04:51.180
The action will be the same, and the files that we will be checking for any changes will be under the

04:51.220 --> 04:52.260
DAGs folder.

04:56.620 --> 05:01.620
And finally the docker compose YAML.

05:02.380 --> 05:08.200
Now that we have our setup, we need to think how the jobs will be executed in order to continue writing

05:08.200 --> 05:08.840
the YAML file.

05:08.840 --> 05:11.440
If we were to do this manually, we would first.

05:11.480 --> 05:15.560
Docker compose up to bring up all the services related to airflow.

05:15.880 --> 05:22.520
Once we have the containers running, we need for the CI part to run the Pytest command in order to

05:22.520 --> 05:29.240
run all the tests inside this folder, and for the CD part, we would run the airflow DAGs test command

05:29.640 --> 05:34.360
for each Dag that we have in the Main.py inside the DAGs folder.

05:34.840 --> 05:41.640
Finally, after we have concluded all our tests, we simply tear down the docker compose to clean up

05:41.640 --> 05:42.600
the resources.

05:43.160 --> 05:48.600
So if we had to write these steps that we just mentioned in the YAML file, making sure that at each

05:48.600 --> 05:54.680
step we specify that we only run if any of the files we defined in this job were changed.

05:54.680 --> 05:56.400
This would be as follows.

05:56.600 --> 06:00.560
So first we define the docker compose up.

06:03.600 --> 06:05.240
Taking care of the indentation.

06:07.170 --> 06:10.090
Next we run the unit and integration tests.

06:12.330 --> 06:16.570
And lastly, we define the end to end step in our main.py.

06:16.610 --> 06:23.730
In the DAGs folder we have three main DAGs with the IDs produce underscore JSON, update, underscore

06:23.730 --> 06:25.410
db and data quality.

06:26.650 --> 06:33.330
So we can either define them explicitly here in the list, or we could generate this list dynamically.

06:33.330 --> 06:36.570
In the production scenario it would make sense to use the latter.

06:36.810 --> 06:39.610
However, we are only dealing with three DAGs.

06:39.850 --> 06:46.130
And to keep things simple, I will go with the first option of defining the Dag IDs explicitly.

06:46.810 --> 06:53.810
After we have defined a list with IDs for each Dag, we need to run the airflow DAGs test command in

06:53.810 --> 07:01.050
an iterative way, making sure that we first start from produce underscore JSON, then update underscore

07:01.050 --> 07:04.730
db, and finally the data underscore quality dag id.

07:05.090 --> 07:07.290
To do this, we can define a for loop.

07:07.290 --> 07:10.270
And if we were to write this down, it would be as follows.

07:10.870 --> 07:18.310
As a quick side note, this add syntax over here ensures that each item in the array is treated as a

07:18.310 --> 07:21.310
separate argument, while the command itself.

07:21.350 --> 07:24.670
We have already seen in the end to end test section.

07:25.430 --> 07:30.670
Same goes for the command to specify it here for the unit and integration tests.

07:30.990 --> 07:36.550
Finally, we simply tear down the docker compose after we have run our dags.

07:36.670 --> 07:39.550
This concludes the GitHub workflow YAML file.

07:39.590 --> 07:41.430
What is left is to test it out.

07:41.470 --> 07:43.990
The question is how we can trigger this workflow.

07:44.030 --> 07:49.110
Remember that we designed this GitHub workflow so that we can trigger it in three different ways.

07:49.110 --> 07:53.510
And this we did it in the on section of the YAML file which you see over here.

07:54.070 --> 07:59.670
Now since we have done changes to the docker compose file, this will trigger the second job in our

07:59.670 --> 08:00.310
pipeline.

08:00.590 --> 08:06.830
The first job will also be triggered since we are using this needs parameter, but the actual job won't

08:06.830 --> 08:13.240
be run since nothing in the requirements.txt or docker file have changed since the last commit.

08:13.240 --> 08:20.040
So to also test out the first job, we will change the requirements.txt to use a more recent version

08:20.040 --> 08:20.960
of pytest.

08:21.000 --> 08:25.880
Let's say version 8.3.3 and we save like that.

08:25.920 --> 08:30.040
Both the docker compose and requirements.txt have changes inside of them.

08:30.360 --> 08:34.560
So let's add the YAML file and commit the new changes.

08:46.400 --> 08:47.520
Finally we push.

08:50.760 --> 08:55.520
So once we push the changes, if we go on GitHub we will see the execution of the pipeline.

09:04.920 --> 09:07.480
Here you can see it over here and it's in pending state.

09:09.840 --> 09:10.860
Click on details.

09:11.420 --> 09:14.580
And here we are seeing the actual execution of the pipeline.

09:15.100 --> 09:21.580
Hopefully it will give a success, but if not, we will look at the logs and see where errors might

09:21.580 --> 09:22.100
pop up.

09:27.580 --> 09:34.460
So it seems that the build and push image job is successful as we have a green tick, and now we will

09:34.460 --> 09:38.780
wait for the unit and integration with also the end to end tests.

09:48.740 --> 09:51.820
So we are seeing that the tests are passing.

09:52.300 --> 09:55.660
Remember these we executed using this command by test.

09:59.980 --> 10:03.340
Here we are executing the end to end Dag as well.

10:03.580 --> 10:06.900
And we are returning also the logs from the video IDs.

10:11.990 --> 10:16.430
So at face value it seems that boats jobs have succeeded.

10:16.630 --> 10:25.110
If we hadn't, we would get a red X, so we had to go briefly in the jobs themselves.

10:25.910 --> 10:29.590
You can see that the for example, the build and push image.

10:29.590 --> 10:31.790
Here we have successfully logged in to Docker Hub.

10:33.270 --> 10:38.630
We pushed the new image with the latest tag and also the commit hash.

10:41.710 --> 10:50.230
This is some cleaning for build x, also some other post checkout code and the job is complete.

10:50.230 --> 10:54.990
One verifications we can do is to go on Docker Hub and check the image.

10:54.990 --> 10:57.550
Now has the latest tag with a new commit hash.

11:09.710 --> 11:15.550
So signing into Docker Hub, the first thing that I can see is that we have a push which was only five

11:15.550 --> 11:16.130
minutes ago.

11:16.930 --> 11:17.850
Let's click on this.

11:18.770 --> 11:25.050
And as you can see we have the latest image which is the following.

11:25.490 --> 11:30.810
And as you can see we have a history of the tags that we had.

11:30.810 --> 11:39.090
So first we had the initial 1.0.0 and also the point one where we did a small change.

11:40.690 --> 11:45.570
And now we have the same image but it has two different tags.

11:45.890 --> 11:51.690
It has the latest tag and also the commit hash which you see over here.

11:52.130 --> 11:58.650
And to link things we go on git log we will see the latest commit will have this hash.

11:58.770 --> 12:01.970
So as you can see it starts with one to 8 to 9.

12:03.410 --> 12:05.690
Let's go and write the git log command.

12:05.850 --> 12:11.490
And as you can see the latest commit also has this commit hash.

12:11.530 --> 12:13.050
The 1 to 8 to nine.

12:13.100 --> 12:18.420
And then it also has the commit message that we just committed.

12:22.220 --> 12:30.860
Going also on the unit and integration and end to end tests, we can see the setup procedure.

12:31.180 --> 12:36.060
Here we are running the unit tests integration tests which we saw while it was running.

12:36.460 --> 12:43.420
But here we can verify that all the tests have passed the end to end tag tests as well.

12:45.500 --> 12:48.620
These are all the logs that we get from airflow.

12:50.540 --> 12:55.820
And here we also have or what we are seeing here the data quality tests.

13:03.060 --> 13:07.100
Here you are also seeing that this workflow has a workflow dispatch events trigger.

13:07.500 --> 13:10.980
We will soon touch upon this when we go into the manual triggering.
