WEBVTT

00:02.200 --> 00:06.580
Hello again! In this video, we are going to look at miscellaneous string operations.

00:08.790 --> 00:10.800
We are going to look at the data member function.

00:11.370 --> 00:15.600
You will remember that the string has a pointer to a buffer containing the actual characters.

00:16.140 --> 00:22.260
And when we call the data member function, it returns a pointer to the start of this memory buffer.

00:24.230 --> 00:30.020
You may remember that we had a function c underscore str, which did this. That returned a null terminated

00:30.020 --> 00:35.360
string, and in C++ 11, data() will also return null terminated string.

00:35.930 --> 00:41.390
That might not necessarily be the case in older versions of C++, but in modern versions,

00:41.790 --> 00:46.160
the two do exactly the same thing. And this also works with the vector class as well.

00:46.730 --> 00:52.160
So if you have to work with code written in C. Maybe an operating system API, or some library, or some

00:52.160 --> 00:58.100
old code that you are converting, then you do not have to use arrays and pointers in your C++ code.

00:58.520 --> 01:05.420
You can use string and vector, and then convert the results when you need to interact with C code. For

01:05.420 --> 01:11.540
example, you could use this vector of ints and then, when you have to call this C function, which takes

01:11.540 --> 01:19.760
an array as pointer to int and an element counts, then you can call the data() to get a pointer to the start

01:19.760 --> 01:26.000
of the data, which will be compatible with a C array, and the size member function will return the

01:26.000 --> 01:26.840
number of elements.

01:27.680 --> 01:30.380
So that would be completely compatible with the C code.

01:33.600 --> 01:39.240
So here is that code. I have just written the C-style function that uses an array and a count and prints

01:39.240 --> 01:40.050
out all the elements.

01:40.590 --> 01:46.710
Then we have our vector. And then we pass the results of calling data() and size() as arguments to the C

01:46.710 --> 01:47.190
function.

01:49.940 --> 01:54.290
So there we are, the function points out the elements of the array, and it all works perfectly.

01:57.840 --> 02:00.840
The other string member function I want to talk about is swap().

02:01.320 --> 02:03.300
This also applies to vector as well.

02:03.930 --> 02:06.120
So this will just swap two strings.

02:06.780 --> 02:08.550
We can call it as a member function.

02:08.560 --> 02:15.390
So if we call it on a string and pass another string as argument, then this is going to swap the data

02:15.390 --> 02:16.920
of the two string objects.

02:17.820 --> 02:22.500
So if we have s1 having "Hello" as data and s2, "Goodbye".

02:22.860 --> 02:27.450
Then after the swap operation, s1's data will be "Goodbye" and s2's data

02:27.540 --> 02:28.140
will be

02:28.590 --> 02:29.040
"Hello".

02:30.430 --> 02:35.360
There is also a non-member version of this. So we call swap s1 and s2.

02:35.590 --> 02:39.220
And that will again exchange the data of the two string objects.

02:43.300 --> 02:48.670
So here we are doing this. We are going to do the member function swap and print out the results, and

02:48.670 --> 02:51.430
then do the non-member function swap and print out the results.

02:51.880 --> 02:54.820
So in theory, we should get back to the original strings!

02:57.360 --> 02:59.490
So there we are. We start out with "Hello" and "Goodbye".

02:59.940 --> 03:06.090
Then we have "Goodbye" and "Hello" after doing the swap function with the member function. And then the

03:06.300 --> 03:08.500
non-member function takes us back to the originals.

03:12.260 --> 03:18.890
The non-member version of this function has overloads for all the built-in types and the types in the C++ library.

03:19.580 --> 03:23.390
Obviously, it will not have overloads for your own types unless you provide one.

03:27.280 --> 03:32.980
So what happens in that case? If you call swap and it's not overloaded for that particular object,

03:33.550 --> 03:34.900
then the default is called.

03:35.260 --> 03:38.710
And that just does a naive copy by creating a temporary object.

03:39.280 --> 03:44.140
So it will back up one of the arguments, say the first one and then it will overwrite that argument.

03:44.620 --> 03:48.160
And then it overwrites the second argument with the back up of the first.

03:48.880 --> 03:52.150
And for simple classes where copying is a trivial operation,

03:52.270 --> 03:53.230
that is absolutely fine.

03:53.710 --> 03:57.640
But for something like string, this is going to be rather a slow and inefficient operation.

03:59.440 --> 04:04.600
For starters, each one of these operations will require the data in the string on the right to be copied

04:04.600 --> 04:05.680
into the string on the left.

04:06.340 --> 04:10.090
So that is one processor instruction per character.

04:10.750 --> 04:16.870
In the best case, depending on where it is memory. It could take a lot longer. If the buffer in one

04:16.870 --> 04:22.780
of these strings is not big enough to hold the data from the other object, then it will need to allocate

04:22.780 --> 04:24.280
a new buffer and release the old one.

04:25.390 --> 04:27.890
And memory operations are very slow.

04:27.910 --> 04:33.400
They can often take hundreds of instructions. And in any case, the copy constructor will need to

04:33.400 --> 04:35.590
allocate some memory because it is creating a new object.

04:36.310 --> 04:40.870
So that could quite easily take thousands of processor instructions for what looks like a simple

04:41.470 --> 04:42.070
operation.

04:43.240 --> 04:48.430
And that is time when your processor is doing nothing and it cannot execute any other code from your program.

04:49.420 --> 04:52.240
So that is going to add to the execution time of the program.

04:54.090 --> 04:56.880
So, the string overload does not do that.

04:57.330 --> 05:00.810
It is a bit more clever, it knows what a string is and what a string does.

05:02.640 --> 05:10.320
If you remember what a - if you remember how a string is structured in memory. It has this header

05:10.320 --> 05:15.540
which has the element count and a pointer to the data, and then the actual characters are in a memory

05:15.540 --> 05:17.790
buffer, which the header points to.

05:18.480 --> 05:24.540
So instead of copying data in and out of these buffers and allocating buffers, why not just swap around

05:24.540 --> 05:25.020
the headers?

05:25.950 --> 05:31.200
So s1 will get a header which has the element count for "Goodbye" and a pointer to the data

05:31.200 --> 05:31.770
with "Goodbye".

05:32.670 --> 05:38.490
And s2 ends up with a header which has the element count for "Hello" and points to the data with 

05:38.490 --> 05:38.670
"Hello" in it.

05:40.140 --> 05:43.080
So you get the same results, but with much less operations.

05:43.590 --> 05:45.690
All do you do is just swap around these two headers.

05:52.170 --> 05:57.630
So the advantage this is there aren't any memory allocation or release operations, and none of the

05:57.630 --> 05:59.160
actual character data is copied.

06:00.090 --> 06:04.710
The only thing that gets copied is the count and the pointer in the header.

06:05.250 --> 06:09.030
So that's a pointer assignment and an integer assignment for each of these operations.

06:09.600 --> 06:13.110
So that is six assignments in all. And

06:14.640 --> 06:18.540
that is nothing compared to the thousands of instructions that you need with the old way.

06:20.280 --> 06:21.720
OK, so that is it for this video.

06:22.140 --> 06:25.230
I will see you next time, but meanwhile, keep coding!
