WEBVTT

00:01.020 --> 00:04.910
Hello again! In this video, we are going to look at shallow and deep copying.

00:06.840 --> 00:12.390
We are building up this String class with a capital "S", which is going to replicate some of the functionality

00:12.390 --> 00:13.740
of the library string class.

00:14.640 --> 00:20.640
So far we have taken the easy option and used a library string member. Now we are going to make it a bit more

00:20.640 --> 00:21.210
realistic.

00:21.540 --> 00:23.340
We are going to provide our own storage.

00:23.940 --> 00:26.340
We need to have an array to store the characters.

00:26.610 --> 00:28.590
And this array is going to be stored on the heap.

00:30.320 --> 00:36.110
So our class will now have a member which is a pointer to char, so that is going to be a pointer to

00:36.110 --> 00:42.920
the memory, an array. We are going to have a size member which has the number of elements in the array,

00:42.920 --> 00:44.610
the number of characters in the data.

00:45.200 --> 00:47.690
And then we have some member functions for the interface.

00:49.130 --> 00:56.270
This is going to be written to use the RAII idiom for managing resources. So it will allocate the memory for

00:56.270 --> 01:00.710
the array in the class's constructor, and the memory will be released in

01:00.710 --> 01:02.000
the class's destructor.

01:05.600 --> 01:09.590
So this is the typical pattern for managing resources in C++.

01:09.830 --> 01:12.590
Here the resource is the allocated memory.

01:13.070 --> 01:17.090
The String class is responsible for managing the lifetime of this resource.

01:18.140 --> 01:23.870
It needs to make sure that the resource is correctly acquired before any member functions in the class

01:23.870 --> 01:24.350
use it.

01:25.670 --> 01:30.980
It needs to make sure that the resource is correctly released when the class no longer requires it.

01:32.030 --> 01:36.320
If there are any objects being copied, it needs to make sure that the copying of the resource is handled

01:36.320 --> 01:39.680
correctly. And, looking ahead to move semantics,

01:40.010 --> 01:45.590
if there is any transfer of the resource from one object to another, then that needs to be handled correctly

01:45.590 --> 01:45.980
as well.

01:46.760 --> 01:48.620
We are going to leave that to a later section.

01:48.890 --> 01:51.500
In this video, we are going to concentrate on the copying.

01:56.370 --> 02:03.060
So here is our String class with these two members, one for the data and one for the number of elements

02:03.060 --> 02:06.960
in the array. In the constructor, we need to allocate the memory.

02:07.530 --> 02:10.200
So we need to know how many elements we need to allocate.

02:10.800 --> 02:13.140
We are passed a standard string as argument.

02:13.830 --> 02:19.350
So if we call the size member function of that string, that will give us the number of elements, then

02:19.350 --> 02:22.080
we can initialize our own size member from that.

02:23.400 --> 02:27.450
And then in the body of the constructor, we can allocate the memory.

02:27.960 --> 02:32.120
In fact, you could actually move this into the initializer list, if you prefer, but I think it is clearer

02:32.160 --> 02:34.800
this way. It will not really make any difference.

02:36.000 --> 02:41.700
So we allocate a memory buffer, which contains "that many" elements, and then we just iterate through

02:41.700 --> 02:46.530
it, and we populate the data with the characters from the argument string.

02:49.550 --> 02:55.070
For the copy constructor and the assignments operator, we're going to use the ones which were synthesized

02:55.070 --> 02:56.480
by the compiler, for the time being.

02:56.870 --> 02:58.430
[In order to] see what that does.

03:00.230 --> 03:03.530
Then we have a destructor where we release the memory.

03:04.040 --> 03:05.210
So we call delete.

03:06.900 --> 03:12.060
Because we used the array form of new in the constructor, we have to use the array form of delete here.

03:12.480 --> 03:15.150
If we missed that out, we will probably crash the program.

03:17.070 --> 03:22.650
I have also put in a print statement so we can see what is going on. So we can see when the destructor is

03:22.650 --> 03:23.010
called.

03:25.570 --> 03:31.060
There is also a length() member function, which will return the value of this size member. So we can

03:31.060 --> 03:38.590
see how many elements there are in the string. Then, in our main function, we create a couple of objects,

03:39.640 --> 03:40.870
by calling the constructor.

03:41.380 --> 03:44.050
Then we create another object, by calling the copy constructor.

03:44.590 --> 03:46.810
We call the length member function.

03:47.620 --> 03:48.580
And then we

03:50.020 --> 03:50.360
finish.

03:51.040 --> 03:55.750
So when the program leaves the main function, that will be the end of the scope in which these String

03:55.750 --> 03:59.980
objects were defined. The destructor for these Strings will be called.

04:00.880 --> 04:02.890
These are going to be destroyed in the reverse order

04:02.920 --> 04:07.870
they were created. So string3 will be destroyed first, then string2, then this string.

04:08.830 --> 04:11.530
So we should see three destructor calls.

04:15.440 --> 04:16.400
Were you expecting that?

04:17.000 --> 04:18.170
So we have a problem.

04:18.890 --> 04:23.000
It does not tell us anything useful about what the exact problem is, but it is something to do with the

04:23.000 --> 04:25.220
heap being invalid or a pointer being invalid.

04:26.390 --> 04:29.810
So.. and also we only see two destructor calls.

04:30.680 --> 04:35.150
So this destructor was called and this destructor was called, but something went wrong after that.

04:39.410 --> 04:45.440
And the problem, in fact, is in this synthesised copy constructor. This has just copied the data

04:45.440 --> 04:45.910
points.

04:46.520 --> 04:52.400
So the pointer in the argument object has overwritten the pointer in the new object.

04:52.580 --> 04:53.780
So they both have the same pointers.

04:53.870 --> 04:55.910
So we have two objects pointing to the same memory.

04:56.360 --> 05:01.040
So when their destructors call delete, the arguments to delete is the same pointer.

05:01.610 --> 05:03.470
So we are trying to release the same memory twice.

05:04.220 --> 05:07.760
And the second time, that will mean that we are releasing memory that does not belong to us.

05:08.300 --> 05:10.070
So we get the memory error.

05:10.760 --> 05:13.790
So what we need to do, is to do a "deep" copy.

05:14.210 --> 05:17.240
We did a "shallow" copy before where we were just copying the pointer addresses.

05:17.630 --> 05:20.060
Now we actually need to do it properly.

05:20.090 --> 05:25.460
So we need to give the new object a new allocation of memory that belongs to it and no one else.

05:26.480 --> 05:29.210
So we allocate memory.

05:29.720 --> 05:32.330
We find out the number of elements from the argument.

05:34.440 --> 05:38.970
And then it is basically the same as the constructor, except instead of having a standard string, we

05:38.970 --> 05:41.040
have one of our own objects as the argument.

05:43.150 --> 05:47.410
And then the rest of the code is the same, so let's see what happens now.

05:48.460 --> 05:49.310
And that is much better.

05:49.330 --> 05:52.570
So we do not get a crash, and we get three destructor calls.

05:55.680 --> 05:58.710
We also have the same problem when we perform an assignment.

05:59.790 --> 06:04.680
The compiler will synthesize an assignment operator, which does a shallow copy.

06:05.310 --> 06:10.400
So the pointer gets overwritten. And then we should get the same result as we did with the shallow

06:10.470 --> 06:11.280
copy constructor.

06:13.900 --> 06:19.260
I have added a couple of extra lines, so we now are doing an assignment and an extra print statement.

06:20.370 --> 06:21.570
So what do you think will happen?

06:23.940 --> 06:25.050
And yes, there we are.

06:25.230 --> 06:27.000
We get the crash again. And again,

06:27.000 --> 06:28.680
we only get two destructor calls.

06:30.240 --> 06:33.750
The rest of it seems to have worked, so we called the copy constructor.

06:34.740 --> 06:40.470
The two Strings have three elements, so that seems to have worked. And that still has one element.

06:40.800 --> 06:45.000
So it was just the destructor which was caused by the shallow copy.

06:46.500 --> 06:49.020
So obviously, we need to do a deep assignment.

06:49.800 --> 06:52.230
That is a little bit more complex than the copy constructor.

06:52.590 --> 06:56.910
So obviously, we need to make a new memory allocation for the target object.

06:57.480 --> 07:04.380
The one that is being assigned to. Before this, we need to release the old allocation, because that is

07:04.380 --> 07:07.350
the last time that we have access to the address of that memory.

07:08.490 --> 07:10.920
If we leave it till later, we will have lost that address.

07:11.130 --> 07:13.140
So we need to release that before we do the allocation.

07:13.830 --> 07:16.950
There is also one important thing we need to do before we do any of this.

07:17.370 --> 07:19.410
We need to check for self assignment.

07:19.920 --> 07:23.640
It is possible if it's the object, which is the argument to the assignment

07:23.640 --> 07:27.150
operator, could be the same object as the one that is the target.

07:28.110 --> 07:29.780
You may say that is a bit pointless.

07:29.790 --> 07:34.890
Nobody ever writes code like "x = x", and usually they do not. At least not directly.

07:36.300 --> 07:42.360
If you have function calls and loops and function calls and more loops and lots of complicated nested code,

07:42.840 --> 07:48.070
it is possible that you have some code that assigns the elements of an array to the elements of another

07:48.090 --> 07:50.790
way, and that the two arrays are actually the same.

07:52.050 --> 07:53.550
And why is this a problem?

07:54.180 --> 07:58.560
If the two objects are the same, and in C++, that means they have the same address.

07:59.640 --> 08:05.100
If the two objects are the same, then the pointers to the data will have the same address. And the pointer

08:05.110 --> 08:06.120
in the target object,

08:06.150 --> 08:09.990
"this", will have the same address as the pointer in the argument object.

08:11.190 --> 08:16.410
If we delete the pointer in "this", then the pointer in the argument object will be invalid.

08:17.130 --> 08:23.120
And then later on, when we try to copy the data from the argument array, that has been released. And we

08:23.400 --> 08:24.450
get a memory violation error.

08:25.470 --> 08:26.670
So that is why we need to check.

08:28.260 --> 08:32.800
And also, if the objects are identical, then we do not need to assign them anyway, because they already

08:32.800 --> 08:33.720
have the correct values.

08:35.100 --> 08:39.480
So if you don't like the argument about crashing, perhaps you might like the argument about optimization!

08:40.110 --> 08:44.700
(If you are a true C++ programming, anything to do with optimization or efficiency will make you prick

08:44.700 --> 08:45.210
up your ears!)

08:47.250 --> 08:48.350
So here is our code again.

08:48.360 --> 08:52.200
It is all the same, except that I have now implemented the assignment operator.

08:58.050 --> 08:59.070
So here it is.

08:59.760 --> 09:05.220
We have a print statement as usual, to help us see when it is being called, and then we check for self

09:05.220 --> 09:05.730
assignment.

09:06.450 --> 09:13.230
So if the two objects have the same addresses, so if the address of the argument object and the address

09:13.230 --> 09:18.180
of "this" object are the same, then we skip all this and we go straight to returning the object.

09:19.920 --> 09:24.540
If they are different, then it is safe to release the original memory allocation.

09:26.430 --> 09:29.010
Then we allocate the new memory allocation.

09:30.770 --> 09:35.030
We said to the size member with the number of elements, and then we do the iteration.

09:35.510 --> 09:40.790
So this is the same as doing the destructor, and then the code from the copy constructor.

09:44.870 --> 09:47.480
And we have the same main function, so let's see what happens now.

09:51.690 --> 09:56.400
So there we are, that looks all right. And we get three destructor calls and no crashes.

09:59.410 --> 10:03.640
So how do we know when we need to implement the copy constructor or the assignment operation?

10:04.180 --> 10:09.730
There is something called the "Rule of Three", which says if the class needs to implement one of these -

10:09.730 --> 10:14.650
copy constructor, assignment operator or destructor, then it probably needs to implement the other

10:14.650 --> 10:19.240
two as well, because the ones that the compiler synthesizes will not give the correct behaviour.

10:19.930 --> 10:26.050
So we have our example, where we needed deep copying for the pointer, and that was the rule of three.

10:27.260 --> 10:28.220
(With an extra quotation mark!)

10:30.980 --> 10:36.290
You may be wondering about the default constructor, because the compiler will also synthesize a default constructor.

10:36.830 --> 10:41.900
Usually, that is not a concern because, if we are managing a resource, then the constructor will take some

10:41.900 --> 10:42.410
argument.

10:43.460 --> 10:48.530
So we need to know how much memory to allocate, the name of the file we are going to open, how to connect

10:48.530 --> 10:49.760
to the database and so on.

10:50.990 --> 10:56.120
And if we implement a constructor which takes an argument, then the compiler will not synthesise

10:56.120 --> 10:57.260
a default constructor.

11:00.070 --> 11:04.720
Sometimes it is useful just to have a default constructor, so you can create an empty object.

11:05.260 --> 11:10.630
And in that case, you can watch one which will initialize the object with an empty state, whatever

11:10.630 --> 11:11.110
that means.

11:11.650 --> 11:17.830
So for the case of our String, that means a String with no data, so the data pointer will be null and

11:17.830 --> 11:19.810
the size member will be 0.

11:20.800 --> 11:26.500
And finally, there is also the "Rule of Zero", which says, if the compiler does synthesize special

11:26.500 --> 11:31.060
member functions which give the correct behavior, then forget about it!

11:31.150 --> 11:33.520
Do not bother implementing them, because that will be fine.

11:34.210 --> 11:35.620
Just concentrate on the rest of your code.

11:36.370 --> 11:42.610
So this rule of three is really only for resource management and classes which need unusual copying,

11:42.610 --> 11:43.960
or assignment, or destruction.

11:44.920 --> 11:46.840
OK, so that is it for this video.

11:47.260 --> 11:50.230
I will see you next time, but until then, keep coding!
