WEBVTT

00:00.150 --> 00:03.240
Hello again! In this video, we are going to talk about the set.

00:04.350 --> 00:08.310
A search is a collection of elements, without any particular structure.

00:09.000 --> 00:12.360
C++ has a class for this in the library, called set.

00:14.150 --> 00:18.350
This is an associative container, so it is going to be the first time we have looked at an associative

00:18.350 --> 00:18.830
container.

00:19.790 --> 00:24.470
It is a very minimal associative container, because the element just has a key and nothing else.

00:25.520 --> 00:29.360
This key has to be unique, so we cannot have two elements with the same key.

00:33.000 --> 00:39.810
The elements in a sets are stored in order using the less-than operator of the key, and the set is

00:39.810 --> 00:44.490
implemented as a tree, because there are efficiency requirements in the language, and that is the only

00:44.490 --> 00:45.090
way to meet them.

00:45.900 --> 00:47.670
Usually, they use a red-black tree.

00:51.630 --> 00:56.280
So this is what a set will look like. And you will notice it looks exactly the same as the diagram in

00:56.280 --> 00:59.970
the last video, and this is because the elements just have a key and nothing else.

01:03.180 --> 01:07.650
To add an element to the set we use insert(). To remove one, we use erase().

01:08.370 --> 01:12.570
We cannot use push_front() or push_back(), because trees do not have a front or back.

01:13.560 --> 01:16.740
These will make sure that the element is added in the right place in the tree.

01:19.630 --> 01:25.270
If we do an insert() or erase() operation, the tree will check whether it is correctly balanced. If it has

01:25.270 --> 01:30.850
become unbalanced enough, it will actually stop the operation and rebalance itself. And then come back

01:30.850 --> 01:32.140
to the operation after it has finished.

01:32.680 --> 01:38.950
So sometimes, very occasionally, an insert() or erase() operation may take much longer than it would normally.

01:42.390 --> 01:47.550
If we try to insert an element into a set which has the same key as an element that is already there,

01:47.940 --> 01:51.780
then the insert() operation will fail and you might want to think about why that is.

01:56.290 --> 02:01.390
And the answer is that elements have to have a unique key, so we cannot have two elements with the

02:01.390 --> 02:01.960
same key.

02:02.000 --> 02:03.190
So that is why it has to fail.

02:03.700 --> 02:06.790
The return value from insert() is a little involved.

02:07.090 --> 02:08.800
It actually actually returns a pair object.

02:09.580 --> 02:12.550
Usually the second member of this pair is the one that we are interested in.

02:12.940 --> 02:16.600
This is a button which will tell us whether the operation succeeded or failed.

02:17.710 --> 02:21.190
The first member is an iterator to an element in the set.

02:22.060 --> 02:27.720
If we succeeded in inserting a new element, it will be an iterator to that new element.

02:28.210 --> 02:32.650
If the operation failed, then we get an iterator to the element, which already has that key.

02:33.340 --> 02:36.340
So that will be the element which cause the insertion to fail, if you like.

02:41.110 --> 02:42.220
So let's try this out.

02:42.580 --> 02:43.950
We include the set header.

02:44.130 --> 02:47.310
We create a set object.

02:47.770 --> 02:50.140
And then we call insert(), to add some elements.

02:50.740 --> 02:54.610
So these are insert certain elements with keys 6, 7, 4, and so on.

02:55.870 --> 02:58.360
Then we try to insert an element with key 3.

02:58.930 --> 03:02.140
The set already has an element with that key, so we expect this to fail.

03:03.670 --> 03:09.610
We get a pair object returned by insert(), and the second member will tell us if this succeeded.

03:11.420 --> 03:17.120
And the first member will be an iterator to the element which caused it to fail, if it failed. Or the

03:17.120 --> 03:19.040
element which we inserted, if it succeeded.

03:21.640 --> 03:28.090
Then we remove the element with key 3 from the set. And then we try to insert it again, and this

03:28.090 --> 03:29.080
time it should succeed.

03:32.740 --> 03:33.430
So there we are.

03:33.460 --> 03:34.840
Those are the elements of the sets.

03:35.050 --> 03:37.870
And you will notice that these are displayed in order.

03:37.880 --> 03:41.440
So when we iterate through a set, the elements are returned in order.

03:41.980 --> 03:44.560
So the set does have this internal ordering of the elements.

03:45.820 --> 03:49.300
We tried to insert an element with key 3, but it already contains one.

03:50.200 --> 03:54.310
Then we removed the element, and then we were able to add another element with key 3.

03:56.560 --> 03:59.710
There are a couple of member functions which sometimes come in useful.

04:00.190 --> 04:05.440
The find() member function will return an iterator to the element which has the same key as its argument.

04:06.400 --> 04:09.520
If there is no element which has that key, then it returns last + 1.

04:10.960 --> 04:16.210
The count() member function will return the number of elements which have the same key as its argument.

04:16.960 --> 04:20.470
So because there are no duplicates, there can only be one element with that key.

04:20.830 --> 04:22.180
Or there may not be any at all.

04:22.540 --> 04:25.240
So the only possible return values are 0 or 1.

04:27.490 --> 04:31.690
So let's go back to the code we had before, and this time we are going to call find.

04:34.240 --> 04:40.060
And this will return the iterator. So if it succeeds, we can look at the value. And perhaps we can do

04:40.060 --> 04:40.750
something with it?

04:42.280 --> 04:44.770
So let's see if we can change the value of the element.

04:46.090 --> 04:46.900
What do you think will happen?

04:49.400 --> 04:51.620
Well, there was a bit of a clue, because it was commented out!

04:54.570 --> 04:57.630
So you cannot assign to a variable that is const.

04:58.080 --> 05:01.140
So it looks as though the element is const, or perhaps just the key to it.

05:01.590 --> 05:02.430
We will come back to that.

05:04.930 --> 05:07.020
And then we called the count() member function as well.

05:11.150 --> 05:15.290
So we do have an element with key 7, from dereferencing the iterator.

05:15.860 --> 05:17.570
And there is one element with key 7.

05:20.340 --> 05:24.000
So, using search with algorithms. We are actually a bit restricted.

05:24.480 --> 05:26.340
We have just seen that the elements are const.

05:27.090 --> 05:30.890
And another limitation is that the elements in a set cannot be reordered.

05:31.530 --> 05:33.180
And you might want to think about why that is.

05:34.920 --> 05:38.190
And the reason is that the set maintains its own, internal, ordering.

05:38.790 --> 05:44.000
If we had some algorithm that reorder the elements, that could conflicts with the internal ordering.

05:44.400 --> 05:47.760
And also, if the elements are not const, then we could change the keys.

05:48.090 --> 05:52.740
And that would, again, conflict with the way that the set organizes its data.

05:53.250 --> 05:54.300
So these are not allowed.

05:55.260 --> 06:00.870
So this means there a lot of algorithms that do not actually support set. But we can use ones, so

06:00.870 --> 06:05.660
long as they just reach a range of iterators and they do not attempt to modify the elements or rearrange

06:05.670 --> 06:05.880
them.

06:08.470 --> 06:15.520
So, for example, we could call find_if(). We pass const iterator, so we are just reading the elements, and

06:15.520 --> 06:18.760
then we have a lambda expression, which returns true if the element has key

06:18.790 --> 06:19.180
7.

06:19.600 --> 06:22.120
So that is really just the same as the find() call we had before.

06:24.840 --> 06:25.980
So let's try that out again.

06:26.310 --> 06:30.060
So this time we are calling find_if(), instead of the find() member function.

06:32.820 --> 06:38.040
And we are also calling count_if() as well, to do the same thing as the count() member function.

06:39.810 --> 06:40.950
And we get the same results.

06:44.610 --> 06:46.320
So what is set good for?

06:47.010 --> 06:51.000
It is very fast at locating or searching for an element.

06:51.210 --> 06:52.920
So accessing elements is very fast.

06:53.580 --> 06:56.370
Inserting and deleting elements is usually very fast.

06:56.820 --> 06:59.040
But if the tree has to rebalance, it can be slow.

06:59.910 --> 07:03.870
The main application of sets is for checking for membership of a set.

07:04.470 --> 07:10.320
For example, if you have a service that some users are allowed to access, you could have a set of

07:10.320 --> 07:10.950
user names.

07:11.430 --> 07:16.740
And then, when somebody logs on, you can check if that username is in the set. And if they are, then

07:16.740 --> 07:18.180
they can go ahead and use the service.

07:19.020 --> 07:22.080
And if they are not in the search, then - sorry.

07:22.620 --> 07:25.050
Another application is for getting rid of duplicates.

07:25.410 --> 07:29.610
So if you have some data, and you want to get rid of all the duplicates, or you are not bothered

07:29.610 --> 07:34.680
about keeping them, then you can use them to populate a search. And there will just be one element for

07:34.680 --> 07:36.480
each distinct piece of data.

07:37.080 --> 07:42.780
For example, if you have a document, and you read all the words from the document into a set, then

07:42.780 --> 07:46.080
there will be one element in the set for each distinct word in the documents.

07:47.020 --> 07:50.010
And then you can call size to get the number of distinct words in the document.

07:51.390 --> 07:52.800
OK, so that is it for this video.

07:53.190 --> 07:56.040
I'll see you next time, but until then, keep coding!
