WEBVTT

00:04.360 --> 00:10.240
The main aim of unit tests is to test out individual parts of the code in isolation, to ensure that

00:10.240 --> 00:11.240
it works correctly.

00:11.240 --> 00:16.560
When we talk about individual parts, we are referring to the Python functions we have created in this

00:16.560 --> 00:17.160
course.

00:17.200 --> 00:21.200
A common convention for unit tests is to use mock credentials.

00:21.360 --> 00:27.240
This is the standard approach because you want to isolate the unit being tested from external dependencies

00:27.240 --> 00:29.160
like databases or APIs.

00:29.200 --> 00:34.640
For integration tests, you can use mock or credentials depending on specific scenario.

00:34.680 --> 00:40.600
However, in our case we will use real credentials and for the end to end tests, it is necessary to

00:40.640 --> 00:46.920
use the credentials to simulate the actual process and in our case, ensure that the ELT process is

00:46.920 --> 00:48.280
working as expected.

00:48.320 --> 00:54.920
When performing tests using the Pytest framework, you should be aware of the conf test.py file, which

00:54.920 --> 00:56.120
we will create soon.

00:56.200 --> 01:02.170
In this script, we will put the reusable code that will provide inputs for the test functions, which

01:02.170 --> 01:04.210
we will create in this script.

01:04.250 --> 01:09.530
We will have common setup code like database connections, API keys and more.

01:09.770 --> 01:16.370
This reusable setup code is known as fixtures, and to create such fixture we will use the Pytest fixture

01:16.410 --> 01:17.170
decorator.

01:17.290 --> 01:23.250
I will leave a link to the Python documentation that explains in more detail what the test is.

01:25.370 --> 01:30.650
One last important thing we should know is that to use the Pytest framework when creating the testing

01:30.650 --> 01:36.050
functions to enable the Pytest testing, we will need to define the names of the functions to start

01:36.050 --> 01:37.570
with test underscore.

01:37.690 --> 01:42.530
Here you see such function where the first part is test underscore.

01:42.690 --> 01:48.650
So now that we have covered the basics, let's create a Python script under the test directory and give

01:48.650 --> 01:50.330
it a name unit test.

01:54.090 --> 01:58.260
Here we also create the script which we mentioned earlier.

02:00.460 --> 02:04.380
Let's start off with our first unit test where we will mock the API key.

02:04.660 --> 02:10.900
Since the API key is something that can and will be reused by other functions, we can define the code

02:10.900 --> 02:11.900
for this function.

02:11.900 --> 02:16.420
In the test, we first need to import some essential libraries.

02:17.300 --> 02:24.180
These are the OS pi test, the mock module from the unit test library and variable from the airflow

02:24.340 --> 02:25.380
module library.

02:27.340 --> 02:34.540
And now we can start off by using the Pytest fixture decorator, which as we have already said in Pytest.

02:34.580 --> 02:40.460
This indicates that this function is a fixture and will be used to provide input data for the main test

02:40.460 --> 02:41.260
functions.

02:41.300 --> 02:42.780
The syntax is as follows.

02:43.020 --> 02:47.380
We use the add for decorator and that's how you define the syntax.

02:47.420 --> 02:49.180
Then we define the actual function.

02:49.180 --> 02:52.100
Let's call it API underscore key in our case.

02:53.980 --> 03:00.060
Next we use the mock dot Patch dot dict, which handles dictionaries to mock the environment variable.

03:00.100 --> 03:00.540
Airflow.

03:00.540 --> 03:01.020
Underscore.

03:01.380 --> 03:02.340
Underscore API.

03:02.380 --> 03:03.220
Underscore key.

03:10.500 --> 03:16.860
What this piece of code does is it temporarily updates the environment dictionary with the key value

03:16.860 --> 03:18.380
pair that we have here.

03:18.860 --> 03:24.460
Finally, we use the yield to provide the value to the test that requests this fixture.

03:29.060 --> 03:34.620
As we saw in the airflow section, variable dot get fetches the value of any airflow variable.

03:34.620 --> 03:38.260
And in this case we want the API underscore key variable.

03:41.820 --> 03:46.620
Going back to the unit test Python script, first we create the function which as we said, needs to

03:46.660 --> 03:53.100
start with test underscore and then give it an appropriate name which is API underscore key.

03:53.300 --> 03:59.830
This function will need to take as argument the API underscore key fixture we defined in the conf test.py.

04:00.190 --> 04:09.670
So let's start by writing test underscore API underscore key which takes API underscore key argument.

04:10.630 --> 04:17.990
Then to verify the API key, we simply use the Python assert keyword to test if API underscore key is

04:17.990 --> 04:24.350
in fact equal to what we defined in the fixture, which should be mock underscore key 1234.

04:27.630 --> 04:30.230
Before we check if the test passes or fails.

04:30.270 --> 04:36.790
A quick note that I want to add is that apart from naming conventions for functions, Pytest also has

04:36.790 --> 04:39.030
naming conventions for the script names.

04:39.270 --> 04:46.790
So we name the script unit underscore test because Pytest requires that the Python script starts with

04:46.790 --> 04:50.350
test underscore or ends in underscore test.

04:50.350 --> 04:53.120
And here we are using the latter letter syntax.

04:53.120 --> 04:56.120
So going back to the results of our test.

04:56.160 --> 04:58.800
Let's go inside one of the airflow containers.

04:59.080 --> 05:02.920
Here we already are inside the airflow worker container.

05:02.920 --> 05:04.680
And let's run our test.

05:04.840 --> 05:07.840
So we write Pytest dash v.

05:08.280 --> 05:13.960
We use the dash v for a more verbose output, and then we define the path of where the unit test script

05:13.960 --> 05:20.240
is, which is tests unit underscore test py for now.

05:20.440 --> 05:24.600
To run this script, it would give the output for the single function.

05:24.600 --> 05:27.160
However, we will continue to build more tests.

05:27.160 --> 05:32.680
So to specify a particular function we use the dash k option with the function name.

05:35.880 --> 05:36.600
Let's run it.

05:38.640 --> 05:44.200
So the idea is that we named the directory tests and not test.

05:45.760 --> 05:46.920
So let's go back here.

05:48.240 --> 05:51.930
Make sure that this is saved both here and here.

05:53.810 --> 05:55.010
And let's run this again.

05:56.610 --> 05:57.650
And there we have it.

05:57.690 --> 05:59.050
Our test has passed.

05:59.570 --> 06:04.290
We can do the same exact thing for the channel handle, which, as you might remember, is Mr. Beast.

06:04.810 --> 06:10.130
We just need to change the function name and use the appropriate variable name in airflow, so we can

06:10.130 --> 06:19.570
copy and paste the previous function and the y and replace the appropriate variables for the channel

06:19.570 --> 06:20.010
handle.

06:27.850 --> 06:32.890
Again, we can set the channel handle to what we want, even something silly like Mr. Cheese.

06:33.370 --> 06:35.570
So let's go back to the unit test script.

06:35.810 --> 06:36.690
We copy this.

06:40.250 --> 06:41.570
We do our changes.

06:47.130 --> 06:50.420
So now that all the changes have been done, let's run this test again.

06:51.740 --> 06:53.180
We should also get a passed.

06:56.780 --> 07:01.620
Notice how now, since we have two tests and we only specify the channel.

07:01.660 --> 07:02.660
Handle test.

07:02.900 --> 07:06.620
The API key test would be shown as the selected.

07:07.220 --> 07:12.340
Now that we have created unit tests for variable mocking, let's do the same for database connections.

07:12.580 --> 07:17.060
Specifically the database connection to where we are storing the API data.

07:17.220 --> 07:22.700
And in order to do this we need to import the connection module from the airflow dot models.

07:22.780 --> 07:25.660
So after variable we can add connection.

07:26.740 --> 07:30.420
We can also create the fixture with the function name as before.

07:33.460 --> 07:36.460
Changing this to our next target function.

07:38.700 --> 07:44.740
To create our mock Postgres connection, we can define a mock connection variable named Conn and set

07:44.740 --> 07:51.110
it equal to the connection class objects, which is composed of the login password, host port and the

07:51.110 --> 07:51.550
schema.

07:51.590 --> 07:55.270
The schema being the database name and an airflow connection class.

08:00.710 --> 08:08.150
We can then use the get underscore URI methods to create the connection URI for this Postgres connection.

08:08.670 --> 08:15.230
And finally use the mock dot patch dot dict function.

08:15.430 --> 08:19.350
Same as before, but this time we are mocking the connection URI.

08:19.390 --> 08:23.870
So we use airflow underscore conn and then the connection id.

08:23.990 --> 08:25.990
So it comes to as follows.

08:38.950 --> 08:44.470
Finally, for the main function to get the connection, you can use the get connection from secrets

08:44.470 --> 08:50.590
methods provided by airflow to fetch the connection details stored in Airflow Secrets backend.

08:50.750 --> 08:57.230
Here we just need to specify the connection ID, which is the Postgres underscore db underscore YouTube

08:57.270 --> 08:58.310
underscore elt.

09:02.750 --> 09:09.230
Now all that is left is to go back to the unit, test Python script and write our assessment statements,

09:09.270 --> 09:10.990
same as we did for the variables.

09:17.190 --> 09:20.990
So we define the function name with test underscore at the beginning.

09:21.430 --> 09:25.190
So we can write test underscore Postgres underscore one.

09:25.190 --> 09:32.310
And we need as argument the mock Postgres converse fixture which we define in conftest.

09:35.630 --> 09:43.310
Here we simply assign the Postgres connection provided by the test fixture to a variable conn to make

09:43.310 --> 09:49.080
the assertions are easier to read and write, and from here we simply define the assertions as follows.

09:51.120 --> 09:53.440
Unlike that, our function is complete.

09:53.720 --> 10:00.320
To test this out using Pytest, we can run the same command that we have used for the previous two tests.

10:00.360 --> 10:04.240
Obviously specifying the function name that we just created.

10:07.200 --> 10:07.880
Press enter.

10:09.160 --> 10:12.040
And as you can see, all the tests are passed.

10:13.120 --> 10:19.040
One final unit test that we can do involves testing with our DAGs are structured as we expect them to

10:19.080 --> 10:20.240
interact with the DAGs.

10:20.280 --> 10:28.120
We use the Dag bag instance, which collects all our DAGs information so we can continue the airflow

10:28.120 --> 10:28.840
of models.

10:28.880 --> 10:31.160
Import by adding Dag bag.

10:33.240 --> 10:39.880
Now that we have the relevant import in the test script, we can create a very simple Pytest fixture

10:39.880 --> 10:43.210
that returns the Dag, which will be as follows.

10:44.290 --> 10:48.450
So that's all there is to it for the conf test script.

10:48.490 --> 10:53.370
Now let's add our focus to the unit test script and start writing our code.

10:55.250 --> 10:59.810
We first define the function and as an argument use the fixture we just created.

11:03.770 --> 11:07.610
Now this test, as we said, will test the integrity of our Dex.

11:07.650 --> 11:11.730
We can split this integrity testing into four parts.

11:11.770 --> 11:15.410
The first one being that there will be no import errors.

11:15.650 --> 11:18.970
Then the border Dex we defined are being loaded.

11:19.210 --> 11:21.170
The the number of Dex is correct.

11:21.410 --> 11:24.970
And then finally each Dag has a number of tasks we expect.

11:25.010 --> 11:27.650
Let's start off with the no import errors check.

11:27.650 --> 11:31.130
And in this test we will also include print statements.

11:31.130 --> 11:36.410
So when we come to run the Pytest command we can actually see what is being tested under the hood.

11:36.450 --> 11:43.860
This is a simple assertion where we assert that the import Errors instance from the instance is equal

11:43.860 --> 11:44.940
to an empty dictionary.

11:44.980 --> 11:47.060
This means that we have no import errors.

11:47.140 --> 11:53.540
If, however, our assertions fail, we can use the assert statement to return a descriptive message

11:53.540 --> 11:54.700
showing the errors.

11:54.740 --> 12:00.620
As we said, we will print the object as a way of debugging our code, but also seeing what is happening

12:00.660 --> 12:01.540
under the hood.

12:03.140 --> 12:10.820
The printers we have here is simply to separate the prints that we will have from all of these test

12:10.820 --> 12:12.180
integrity functions.

12:12.380 --> 12:16.340
The next subtest is to test that all expected DAGs are loaded.

12:16.580 --> 12:21.780
So first we can write all the expected Dag IDs in a list.

12:23.620 --> 12:28.780
Then we get the actual Dag IDs from the Dag keys using this command.

12:29.380 --> 12:30.980
And then we can print these out.

12:31.420 --> 12:38.420
Remember that these Dag keys will be the decades we defined in the Main.py script under the DAGs directory.

12:40.510 --> 12:47.310
Now using a for loop, we go through each tag ID in our expected Dag IDs list, asserting that this

12:47.310 --> 12:50.390
Dag id is indeed found in the Dag keys.

12:54.510 --> 12:59.870
Before we move on to the third subtest, let me just split this so it's more readable.

13:07.550 --> 13:10.270
The third test is a very simple count of DAGs.

13:10.310 --> 13:12.990
We know that so far we have three DAGs.

13:13.270 --> 13:15.670
So this assertion should return three.

13:15.830 --> 13:17.510
We therefore write the following.

13:20.670 --> 13:22.910
And also we need to print out what we are asserting.

13:23.950 --> 13:28.390
Finally we test that all loaded DAGs have the expected number of tasks.

13:28.590 --> 13:35.790
So first we define a dictionary with key value pairs being the dag id and the expected task count for

13:35.790 --> 13:36.390
each Dag.

13:36.910 --> 13:43.360
So just to make sure we're all on the same page, the produce underscore JSON, we expect it to have

13:43.720 --> 13:45.000
four tasks.

13:45.320 --> 13:55.320
So if we go in main.py for the produce underscore JSON that id, we expect that we will find four tasks.

13:57.880 --> 13:59.680
Next we print the separator.

14:00.320 --> 14:03.400
And here we define the for loop that goes through each Dag.

14:10.520 --> 14:15.200
We define the expected count variable from the dictionary we just constructed.

14:23.360 --> 14:30.400
And the actual count variable which we create by getting the number of tasks for the Dag using the length

14:30.400 --> 14:31.080
function.

14:37.800 --> 14:42.440
We finally do an assertion, and if this fails, we can print out a command as follows.

14:46.480 --> 14:47.880
For debugging sake.

14:47.920 --> 14:53.360
We finally print out the Dag underscore ID and the number of tasks for each Dag.

14:56.120 --> 14:59.440
So this test encapsulates a number of subtests.

14:59.440 --> 15:01.600
If we had to run it using pytest.

15:01.640 --> 15:08.080
Using the command we have been using so far, it would work in telling us if it passed or failed, but

15:08.080 --> 15:10.280
we wouldn't get the prints we defined.

15:10.800 --> 15:17.200
This is because by default Pytest does output capturing, meaning that outputs from prints are by default

15:17.200 --> 15:17.880
not shown.

15:17.920 --> 15:23.400
To disable this option, we can use the Dash s flag so we can write the following.

15:25.000 --> 15:31.320
So after V, we can write dash s and we simply change the function name here.

15:37.370 --> 15:41.730
So as you can see, all of these subtests that we have have passed.

15:42.090 --> 15:46.130
So you might say, how would it look like if one of our subtests have failed?

15:46.810 --> 15:54.130
Let's create an erroneous assertion and say that the produce JSON, the ID has only three tasks.

15:54.170 --> 15:55.530
Let's run this test again.

15:57.970 --> 16:05.610
And as you can see, the assertion error here is telling you exactly how many tasks the produce underscore

16:05.610 --> 16:06.490
JSON has.

16:06.730 --> 16:08.330
Here we are saying that the test three.

16:08.370 --> 16:11.130
But in fact it actually has four tasks.

16:11.570 --> 16:13.450
So this assertion will fail.

16:14.530 --> 16:17.250
So now I will simply change this back to four.

16:17.610 --> 16:21.810
So at this point we have went over some basic but important unit tests.

16:21.970 --> 16:24.930
Feel free to add more tests to keep on practicing.

16:25.410 --> 16:28.130
Next we will move on to the integration testing.

16:28.250 --> 16:29.210
I will see you then.