WEBVTT

00:04.280 --> 00:09.360
Now that we have the tables set up, we need to populate them with the API data that we have saved as

00:09.360 --> 00:11.680
JSON files under the data directory.

00:11.680 --> 00:14.640
So let's create a data loading script.

00:14.880 --> 00:20.920
Let's do this under data warehouse data loading dot pi.

00:21.320 --> 00:27.440
And let's create a function that will be responsible for opening the JSON, reading the JSON data,

00:27.440 --> 00:30.000
and parsing it into a Python object.

00:30.000 --> 00:34.360
So let's create the function named load path.

00:34.400 --> 00:39.720
And before we actually start coding this up, we need to do some imports.

00:40.040 --> 00:45.960
First one is import JSON from the time we need to import date.

00:46.800 --> 00:49.240
And finally we need to import logging.

00:49.280 --> 00:53.400
Now the JSON data modules we have already used in previous lectures.

00:53.400 --> 00:56.240
However, the logging python library we have not.

00:56.480 --> 01:03.520
I will leave the link to the logging documentation in the appendix of the section now The logging module

01:03.520 --> 01:04.040
docks.

01:04.240 --> 01:10.600
Once we have imported logging, we can define a logger object and this will look as follows.

01:10.960 --> 01:19.000
So the logger object will take from logging logger and its name.

01:19.320 --> 01:24.880
This is the preferred way of handling logging in your scripts, and it is a best practice to use this

01:24.880 --> 01:28.280
module and not the Python print function.

01:28.280 --> 01:33.360
In my personal experience, I have found that I use print for troubleshooting and debugging, while

01:33.360 --> 01:38.000
I use logger in production code to output logs from Python functions.

01:38.160 --> 01:46.200
So if we are to continue building the Loaddata function, first we specify the file path to point to

01:46.240 --> 01:51.000
the data directory where we save the JSON in the extraction section of this course.

01:51.440 --> 01:53.800
So the file path would be.

02:02.930 --> 02:10.130
Next we can open a try except block and lock the action which we are performing using Logger.info.

02:14.370 --> 02:20.210
In this particular case, we are processing the file so we can write the following.

02:20.290 --> 02:24.370
If you underscore data and we define today's date.

02:26.770 --> 02:31.130
Same as when we save the JSON, we use the width context manager.

02:31.130 --> 02:33.770
And in this instance we want to get the data.

02:33.770 --> 02:39.210
So we open the file in read mode using the R mode as we will see here.

02:43.970 --> 02:46.970
And then going with the UTF eight.

02:54.610 --> 02:59.410
Next we define the data variable that will contain the JSON data of the API.

03:00.090 --> 03:03.170
This you can simply define this data and return it.

03:03.170 --> 03:03.370
It's.

03:07.570 --> 03:13.250
One thing that I will point out here is that when we define the variable data, we are effectively loading

03:13.250 --> 03:16.370
the entire contents of the JSON file into memory.

03:16.570 --> 03:20.290
If the JSON file is small, like in our case, this won't be an issue.

03:20.650 --> 03:25.330
However, if the file is in the order of gigabytes, then we will have performance issues.

03:25.650 --> 03:31.130
And if the system you are running on doesn't have enough memory, you can run into out-of-memory errors.

03:31.370 --> 03:39.490
Possible solutions for this would be to stream the JSON using a library like JSON or JSON file, line

03:39.490 --> 03:40.050
by line.

03:40.770 --> 03:47.210
Now coming back to the code, if any of the code in the try statement fails, we use the accept clause

03:48.290 --> 03:55.610
and in our case, possible failure could be due to either the file being found or the JSON data is invalid.

03:56.170 --> 03:59.570
In either case, an error will be logged and raised.

04:02.370 --> 04:09.380
And we can use the logger dot But this is a nice feature of the logging module where we can log a message

04:09.380 --> 04:10.860
with level error.

04:13.620 --> 04:16.820
We can define the file path if there's an error.

04:17.460 --> 04:22.260
We can also define the JSON data as invalid exception.

04:31.660 --> 04:35.700
So now we have a function that will store the loaded data as a Python variable.

04:36.660 --> 04:39.940
So before we wrap up here I'm just using two errors.

04:40.820 --> 04:42.620
This t here needs to be capital.

04:42.660 --> 04:46.860
Since if we look under the data directory the t is also capital.

04:46.860 --> 04:52.020
So like that the file wouldn't have been found and we would have gotten the file not found error.

04:52.060 --> 04:52.780
Similar.

04:52.820 --> 04:58.580
We can also do an underscore here to keep the same file name standard.

04:58.620 --> 05:02.540
In the next lecture we will look into the insert updates and deletes.
