WEBVTT

00:00.210 --> 00:03.660
Hello again! In this video, we are going to look at character functions.

00:07.590 --> 00:11.880
C has a number of functions which operate on single characters.

00:12.360 --> 00:15.170
C++ has put these in the cctype header.

00:17.150 --> 00:21.680
To start off with, there is a family of functions which tell us the properties of a character. So we can

00:21.680 --> 00:29.050
find out if it is a digit, 0 to 9, lowercase letter, a to z uppercase, whitespace character, space,

00:29.090 --> 00:33.500
tab, newline etc, punctuation character, and so on.

00:34.490 --> 00:41.540
So here is a simple example of that. We have a string and we have a range for loop. So this will go

00:41.540 --> 00:43.100
through every character in the string.

00:43.520 --> 00:45.620
So c will be each character in turn.

00:46.760 --> 00:50.540
And then we call these functions with that character as argument.

00:51.350 --> 00:56.510
I suppose you could argue these should be "else if" rather than just "if", but it is ondly a demonstration.

00:58.970 --> 01:07.130
So 'H' is uppercase, 'e', 'l', 'l, 'o', are lowercase. Comma is punctuation, space is white space and so on.

01:10.300 --> 01:17.350
If you have used Linux, or possibly some command line programs on Windows or DOS, you may have seen programs

01:17.350 --> 01:21.880
that ask you to type in anything beginning with 'y' to confirm that you want to proceed.

01:23.590 --> 01:30.640
So here is a function that can do that. It is going to use the toupper function. toupper() and tolower() will convert

01:30.640 --> 01:36.040
the arguments to upper or lower case. Or they will return the equivalent, rather. They do not actually modify

01:36.040 --> 01:36.640
the arguments.

01:37.960 --> 01:39.400
And then we have this function.

01:39.670 --> 01:44.710
So this will take the input that was entered by the user. Then we get the first character of that

01:44.710 --> 01:45.160
input.

01:45.820 --> 01:47.440
And then we get the uppercase version.

01:47.980 --> 01:54.310
So if 'c' was lower case 'y', it will now be upper case 'y'. If it was uppercase 'y', it will still be uppercase

01:54.310 --> 01:59.320
y. And if it was something else, it will still be something else. Maybe uppercase something else, but

01:59.580 --> 02:00.370
it does not matter.

02:01.330 --> 02:06.810
So if the result is equal to upper case 'Y', then the user typed in something beginning with 'y' or

02:06.820 --> 02:08.800
uppercase 'Y' and we return true.

02:09.640 --> 02:14.230
And if they did not, they typed in something else and we return false.

02:16.580 --> 02:18.920
So here is that function. We get the input.

02:19.970 --> 02:27.290
We take the first character. We see if it is equal to lowercase 'y' or uppercase 'Y', and we return accordingly.

02:29.320 --> 02:35.140
And then there is a main function. We prompt the user and read their input into a standard string.

02:36.100 --> 02:41.230
Then we call our function with that input as the argument and we print out a suitable response.

02:42.100 --> 02:47.200
We are using a raw string here, just to avoid putting back slashes in front of these double quotes.

02:50.840 --> 02:55.190
Do I want to enter a string which starts with capital 'Y' or lowercase 'y'?

02:55.610 --> 02:56.300
Yes, I do!

02:57.650 --> 02:58.490
Evidently I do!

03:01.210 --> 03:01.870
Let's try...

03:03.850 --> 03:04.360
YES.

03:08.350 --> 03:09.340
I will take that as a "no"!

03:10.600 --> 03:12.400
Okay, so that is how it works.

03:16.480 --> 03:21.760
When we are dealing with C++ strings, the library regards them as being case-sensitive.

03:22.120 --> 03:27.850
So if you have a string which has one with a capital 'O' and one with a lowercase 'o', those will be regarded

03:27.880 --> 03:29.410
as having different data.

03:30.640 --> 03:34.150
And there is no direct support for doing things which ignore case.

03:34.660 --> 03:38.620
There is no function you can call, which says that those are actually the same data, just with a different

03:38.620 --> 03:39.010
case.

03:41.560 --> 03:47.620
In C, compiler vendors provided their own functions, which did case-insensitive comparisons.

03:48.310 --> 03:53.440
Usually they're called stricmp on windows and strcasecmp on Unix.

03:54.250 --> 03:55.990
Not the easiest things to pronounce!

03:56.950 --> 03:58.330
Obviously, these are not standard.

03:58.390 --> 03:59.650
They are also not portable.

04:00.190 --> 04:05.500
If you write a program on windows that uses stricmp and then decide you want to run it on

04:05.500 --> 04:08.560
Linux, you have to change all these calls to strcasecmp.

04:09.910 --> 04:12.040
And besides, they only work with C-style strings.

04:12.490 --> 04:15.490
They do not support the C++ standard string.

04:19.120 --> 04:20.170
So what can we do?

04:21.370 --> 04:28.090
If we want to compare C++ strings without worrying about case, the easiest way to do it is to

04:28.090 --> 04:30.580
convert them all to the same case and then compare them.

04:32.210 --> 04:35.660
So we have a loop which goes over every character in the string.

04:35.990 --> 04:40.250
This time we are using a reference to auto, because we want to modify this character.

04:41.210 --> 04:46.070
Then we get the uppercase version of the character and then we modify the elements in the string.

04:46.490 --> 04:52.370
So after this loop completes, all the characters in the string will have been converted to uppercase.

04:53.760 --> 04:56.910
And we could just as well use lower case, provided we are consistent.

04:57.270 --> 05:03.450
I think computers have always used uppercase. And lowercase came in later, so people tend to use uppercase.

05:05.580 --> 05:08.850
So if you convert all the strings before you compare them, that will work.

05:09.320 --> 05:10.770
However, it does modify the string.

05:11.250 --> 05:17.040
It also means that you lose all the information you had about the case information. And in some applications

05:17.040 --> 05:18.240
that might not be acceptable.

05:20.190 --> 05:25.020
So the next simplest thing to do is to take a copy of the string and then convert that to single case and

05:25.020 --> 05:27.060
then compare the copies of the strings.

05:28.630 --> 05:33.700
But that has quite a lot of overhead. As we discussed, copying strings can take a lot of processor

05:33.700 --> 05:34.120
time.

05:35.740 --> 05:39.820
The other alternative is to have a go at writing our own function for comparing strings.

05:40.570 --> 05:41.860
So let's see how we would do that.

05:47.480 --> 05:52.310
The first thing we need to think about is the interface to our function.

05:53.120 --> 05:57.380
We should have the same interface as the built in operator, so we can use it exactly the same way and

05:57.380 --> 05:58.700
it is not going to confuse people.

05:59.480 --> 06:04.700
Unfortunately, we cannot give it the same name as the built-in operator because that is already defined.

06:05.300 --> 06:09.770
And there is a One Definition Rule in C++, so you cannot have two definitions of the same symbol.

06:10.580 --> 06:11.810
So we have to call it something else.

06:11.840 --> 06:15.140
So let's call it equal_strings. For the sake of argument.

06:16.460 --> 06:21.470
So to have the same interface, it will take two strings by const reference. And return a bool.

06:22.280 --> 06:29.150
And if the two strings are equal, by whatever criterion we use for defining quality, this will return true.

06:29.570 --> 06:31.520
And if they are not equal, it will return false.

06:33.680 --> 06:39.200
The next thing we need to think about is what exactly this function will do. The function will go through

06:39.620 --> 06:42.770
each string, so it is going to compare the corresponding characters.

06:43.220 --> 06:48.320
So the first element from one string and the first element from another string. And it is going to compare

06:48.770 --> 06:51.740
pairs of elements from each string, until it finds a mismatch.

06:52.490 --> 06:55.880
And if it finds a mismatch, the strings are not equal and it returns false.

06:57.050 --> 07:00.110
If it does not find a mismatch, then they are equal, and it returns true.

07:02.170 --> 07:09.040
It is probably not a good idea to start optimizing code straight away, but this is a very easy "win". If

07:09.040 --> 07:13.570
the two strings have different things, then they must be different, so we do not need to actually do

07:13.570 --> 07:14.440
any comparisons.

07:15.040 --> 07:20.230
So we call the size member function. If the return values are different, then the strings have different

07:20.230 --> 07:23.590
lengths and they are different strings. So we can return false straightaway.

07:24.670 --> 07:28.960
And by the way the size member function doesn't actually go through the string like the C version does.

07:29.380 --> 07:32.470
This will just look up the element counts in the string header.

07:33.070 --> 07:34.840
So this is a very fast operation.

07:36.930 --> 07:40.920
So if we get past this, we then know that the two strings have the same length.

07:44.600 --> 07:49.460
And then we have our loop. We are going to look at the corresponding characters from each string and compare

07:49.460 --> 07:49.670
them.

07:51.000 --> 07:57.690
So we start off by getting an iterator to the first element in each string. We use cbegin()

07:57.690 --> 08:04.110
because we do not want to modify the strings. And we can use auto to avoid having to work out the type

08:04.110 --> 08:05.670
of the iterator and type it out.

08:07.290 --> 08:11.700
Incidentally, if we do change our minds later on and we find that we do need to modify the strings, we

08:11.700 --> 08:16.950
just call begin() here. Then we will not actually need to modify the type of these variables.

08:17.910 --> 08:21.840
So that is one argument for auto; it does make refactoring the code easier.

08:22.590 --> 08:28.260
Although there are risks to that. And then we do the comparison, which we will look at in a minute.

08:29.100 --> 08:34.200
Then we increment these iterations so we go into the next element. And we keep on doing that until we

08:34.200 --> 08:35.580
reach the end of the strings.

08:37.110 --> 08:41.520
Actually, you could argue that we only need to do one of these comparisons, because we know that both

08:41.520 --> 08:42.900
strings have the same length.

08:43.470 --> 08:47.850
So they are both going to finish on the same iteration. But it is not a good idea to be too clever!

08:52.190 --> 08:57.890
And then in the comparison, we are going to convert each character to uppercase and then we compare

08:57.890 --> 08:58.130
them.

08:58.520 --> 09:04.130
And if the characters are different after being uppercase, then they are actually representing different

09:04.130 --> 09:05.070
letters of the alphabet.

09:05.090 --> 09:06.110
So we have a mismatch.

09:10.640 --> 09:16.670
So we dereference the iterator to get to the value of each character. On the first time, that will

09:16.670 --> 09:21.350
be the first character in the first argument. And the first time through, that will be the first character

09:21.350 --> 09:24.020
in the second argument, the right hand string.

09:24.830 --> 09:27.740
And then we call toupper() to convert them to uppercase.

09:28.160 --> 09:28.940
And then we compare.

09:29.750 --> 09:34.010
And if they are different, we know the strings are different and we can return false immediately, because

09:34.010 --> 09:34.940
we found a mismatch.

09:36.930 --> 09:41.940
On the other hand, if we keep on going and we get to the end of the loop, then we know that the two

09:41.970 --> 09:43.830
strings do not have any mismatches.

09:44.310 --> 09:46.890
So they must be the same. And then we can return true.

09:52.820 --> 09:54.300
So here is that function.

09:54.320 --> 09:55.010
We have the

09:56.040 --> 10:01.950
signature. So we take two strings by reference to const and we return bool.

10:02.790 --> 10:09.000
We start off by comparing the lengths of the strings. And if they are different, then the strings must

10:09.000 --> 10:09.470
be different.

10:09.480 --> 10:10.620
So we return false.

10:12.800 --> 10:15.890
Then we get the iterators to the first element of each string.

10:17.900 --> 10:23.330
Then we compare the data in those elements, after converting it to uppercase.

10:24.140 --> 10:26.900
And if it is different, we have a mismatch and we return false.

10:27.980 --> 10:32.030
If they are not different, then we increment the iterators and we go on to the next character.

10:32.750 --> 10:34.880
And we keep on doing that until we reach the end of the string.

10:35.630 --> 10:37.520
You could actually write this as a for loop.

10:37.760 --> 10:40.430
And if I was doing a real program, I probably would.

10:40.730 --> 10:43.730
But for demonstration, it is easier to split everything up and do

10:43.730 --> 10:44.840
it one step at a time.

10:48.350 --> 10:53.630
And then if we get to the end of the loop without finding any mismatches, the strings must be equal.

10:53.810 --> 10:55.070
So we return true.

10:58.500 --> 11:04.360
I have written a simple main program to test this. So I am going to create a few string objects.

11:04.870 --> 11:08.890
We have "one" with all lower case and "ONe" with some mixed case.

11:09.310 --> 11:12.790
Then we have a completely different string, just as a kind of "control variable".

11:13.540 --> 11:17.610
Then first of all, I use the library equals operator.

11:19.000 --> 11:20.710
We are using the ternary operator.

11:21.100 --> 11:24.210
So if a is equal to be, then we print nothing.

11:24.520 --> 11:26.530
And if it is not equal, we print "not".

11:29.810 --> 11:35.000
Then we go through and use the equal strings. So we call equal strings on these two strings.

11:35.510 --> 11:37.220
And again, we print out the results.

11:38.240 --> 11:39.980
So what do you think will happen?

11:44.350 --> 11:50.260
So with the library equality operator, "one" and "two" are not equal, "two" and "ONe" arenot equal, of course, but

11:50.260 --> 11:57.310
"one" with lowercase and "ONe" with mixed case are also not equal because the library operator takes case

11:57.310 --> 11:57.910
into account.

11:58.600 --> 12:04.300
And it regards these as having different data. With our equal_strings function, we get the same results

12:04.300 --> 12:06.010
for those two. Which is good!

12:06.640 --> 12:07.900
We have not messed anything up!

12:08.440 --> 12:13.000
And we also find that "one" and "ONe" are equal, regardless of the case of the letters.

12:14.800 --> 12:16.330
So that all works.

12:17.500 --> 12:18.220
So that is

12:19.680 --> 12:22.400
quite a lot of code, I mean - this is mostly comments, actually.

12:22.670 --> 12:24.980
I've tried to comment this as clearly as I can.

12:26.600 --> 12:31.130
Later on in the course, we will actually see that we can actually write this comparison as a single statement.

12:31.790 --> 12:33.440
So it can be made much more concise.

12:34.640 --> 12:38.720
So I keep throwing all these teasers! But we will get there, I promise.

12:39.410 --> 12:40.940
So anyway, that is it for this video.

12:41.360 --> 12:42.110
I will see you next time.

12:42.110 --> 12:44.030
But meanwhile, keep coding!