WEBVTT

00:00.090 --> 00:00.630
Hello again!

00:00.990 --> 00:07.950
In this video, we are going to look at interfacing to C. There are many interfaces which are written in C,

00:07.950 --> 00:10.710
which we may want to use in our programs.

00:11.640 --> 00:17.700
For example, if we want the operating system to perform some service for us, we have to call a function

00:17.700 --> 00:23.370
that is written in C. And database APIs are, similarly, usually written in C.

00:24.360 --> 00:31.320
There are lots of useful third-party libraries and frameworks which are written in C. And also, many interpreted

00:31.380 --> 00:37.020
languages have so-called "foreign function interfaces", which means that they can call functions which

00:37.020 --> 00:43.980
are written in C. For example, if you are working in Python, and you have something which is very time

00:43.980 --> 00:50.250
critical, and Python cannot do it fast enough, then you could replace that bit of the Python script with

00:50.250 --> 00:51.750
a call to a foreign function.

00:54.960 --> 00:56.040
So how do we do this?

00:56.820 --> 01:00.210
C is almost a complete subset of C++.

01:00.660 --> 01:05.040
So this means that C programs are usually legal C++ programs.

01:06.330 --> 01:08.730
There are some things which get in the way.

01:10.140 --> 01:15.330
For example, you could have C code which uses words which are reserved in C++ but not in C.

01:15.720 --> 01:20.760
So you could have a variable called class, for example, because C does not have classes.

01:21.660 --> 01:26.820
In C99, there were some incompatible features introduced. Although one of them, which is designated

01:26.820 --> 01:31.920
initializers, is now in C++. Either 20 or 23, I cannot remember which.

01:34.190 --> 01:39.890
So, usually, the simplest way to do this is to take the entire source code for the program, whether it is

01:39.890 --> 01:44.420
written in C or C++, and compile the whole lot with a C++ compiler.

01:45.740 --> 01:47.570
Unfortunately, that is not always possible.

01:47.900 --> 01:52.190
There may be some incompatibilities in the C code, that you cannot easily remove.

01:52.880 --> 01:58.790
The process of building the C deo may be complex, and it is not really feasible to try and insert a

01:58.790 --> 02:00.680
C++ compiler into that process.

02:01.010 --> 02:03.740
And, of course, you may not have access to the source code in the first place.

02:06.720 --> 02:09.990
So, quite often you have to interface with C at the binary level.

02:10.590 --> 02:16.320
So you would get a header which will declare all the functions in the C interface. And then you will

02:16.320 --> 02:18.600
have some compiled binary code for the functions.

02:19.230 --> 02:25.020
And normally that would be some kind of library. And the library could be a static library or a dynamic library.

02:26.850 --> 02:31.740
Unfortunately, there is no standard on binary interfaces in C or C++.
 
02:32.460 --> 02:36.090
So this means that the compilers you use have to agree on various things.

02:36.780 --> 02:39.210
So they have to use the same object file format.

02:39.900 --> 02:45.390
They have to agree on the conventions for calling functions, and they have to agree on the word size.

02:45.930 --> 02:51.840
If you have one compiler using 32 bits, and one compiler using 64 bits, then when you combine them

02:51.840 --> 02:53.120
together, that's not going to work.

02:54.360 --> 02:59.160
Usually you would use the same compiler on the same computer, because all the main compilers can be used

02:59.160 --> 03:03.420
as either C or C++ compilers. And they are compatible with themselves.

03:04.950 --> 03:12.480
If you are using any features from the C++ library, which includes iostream, then the final step which builds

03:12.480 --> 03:18.940
the program executable needs to be done with the C++ compiler, and that will incorporate the C++

03:18.940 --> 03:21.000
binary code into the program.

03:23.850 --> 03:28.040
We need to be aware of something called "name mangling". In

03:28.050 --> 03:34.410
C, if you have a function in the source code, then it will have the same name in the binary object file.

03:35.430 --> 03:36.390
In C++,

03:36.480 --> 03:37.260
that does not happen.

03:37.380 --> 03:42.990
The compiler will do something called "mangling" of the name. Or "decoration", as Microsoft prefer to call

03:42.990 --> 03:43.170
it.

03:44.040 --> 03:47.880
The reason for that is that C++ has function overloading, which C does not.

03:48.900 --> 03:54.660
If you look at the actual symbols in an object file.Aand you can do that with the "nm" command on Linux

03:54.660 --> 03:59.500
or other Unixes. Windows used to provide something called "objectdump.exe".

03:59.520 --> 04:00.630
I do not know if they still do.

04:01.590 --> 04:05.640
If you do that, you will see that the function which takes int will have this name.

04:05.640 --> 04:11.190
So, "double underscore Z three func one i". That is with gcc.

04:11.190 --> 04:12.870
It could be different with different compilers.

04:13.740 --> 04:18.060
If we have another function with the same name but different arguments that it will have a different

04:18.120 --> 04:19.890
symbol name in the binary file.

04:20.910 --> 04:25.380
And that is how you can have two functions which have the same name, without any conflicts.

04:27.840 --> 04:34.890
So, name mangling does not occur in C. If we want to have binary objects in C++, which are compatible with C objects,

04:35.640 --> 04:43.890
we use something called "extern C". So we put extern double quote C, close quote before the function signature.

04:44.730 --> 04:46.410
And this is a compiler directive.

04:46.680 --> 04:49.830
It tells the compiler, not to mangle the function name.

04:50.580 --> 04:52.950
So in this case, we do not get any underscore things.

04:53.340 --> 04:57.090
We just get the name "func" in the object file.

05:00.660 --> 05:03.450
And if we have multiple symbols, we can put them inside braces.

05:04.110 --> 05:07.490
And we can also say that everything in a header file needs to be "extern

05:07.500 --> 05:12.500
C". So this says that the compiler should not mangle anything in "cstuff"

05:12.510 --> 05:13.050
"dot h".

05:16.920 --> 05:22.590
If we are writing a C++ function that is going to be called from C, we can only use things which C understands.

05:22.890 --> 05:24.570
So we can only use built-in types.

05:25.230 --> 05:29.670
We can also use arrays of built-in types, and pointers to built-in types.

05:30.630 --> 05:34.140
We can use structs which have members of built-in types only.

05:35.490 --> 05:40.320
The function needs to go in the global namespace, because C does not understand namespaces.

05:43.520 --> 05:46.430
We have a header which represents the interface.

05:46.430 --> 05:50.390
This is going to be seen by both the C compiler and the C++ compiler.

05:50.720 --> 05:56.510
It may contain some things which the C compiler will not understand. Such as the "extern C" directive.

05:57.560 --> 06:04.640
So there is a trick for hiding things from the C compiler. When we are compiling with the C++ compiler.

06:05.000 --> 06:13.130
This symbol will be defined in the pre-processor. Double underscore C++ and then we can do a conditional

06:13.130 --> 06:16.850
compilation to exclude this from the C compiler.

06:17.540 --> 06:23.990
So if this symbol is defined, so this is a branch which occurs during the pre-processing, before the

06:23.990 --> 06:26.420
compiler starts actually compiling the code.

06:27.890 --> 06:34.280
So if this symbol is defined, then this branch will be included and compiled in the code.

06:35.000 --> 06:40.820
Otherwise, this branch will be compiled and then we have an endif to finish it off.

06:41.390 --> 06:44.960
So this is all carried out before any actual compilation takes place.

06:47.360 --> 06:52.850
Resource management, which C does not do at all. For memory management,

06:52.910 --> 06:54.560
C uses malloc() and free().

06:54.890 --> 06:58.430
It does not have new and delete. Resource

06:58.430 --> 07:01.060
management is done by using raw pointers.

07:01.070 --> 07:03.050
So the programmer is responsible for everything.

07:03.620 --> 07:07.070
There are no destructors, no RAII, no smart pointers.

07:08.450 --> 07:14.450
Typically, if you have a pointer to a resource, there will be some function which you pass this to, when

07:14.450 --> 07:19.130
you no longer need it. And that will release the resource and perform any tidying up that is required.

07:21.020 --> 07:26.300
Alternatively, if you have allocated memory, there may be a function, or you may be expected to release

07:26.300 --> 07:27.830
it yourself, when it is no longer needed.

07:28.640 --> 07:32.420
So it is important to be clear about what you need to do to release resources.

07:33.260 --> 07:37.040
And if you are lucky, there will be documentation, and it will be up to date, and accurate.

07:37.970 --> 07:41.570
Or, more realistically, you have to read the source code to find out what's going on.

07:44.530 --> 07:46.210
So let's look at an example.

07:47.240 --> 07:50.950
So here is a C main function.

07:50.970 --> 07:52.880
This is going to call some C++ code.

07:55.150 --> 07:57.440
There is the C++ code it is going to call.

07:57.540 --> 08:01.660
Nothing very C++-specific, but iy is a nice, simple example.

08:03.100 --> 08:05.590
Then we have the header which represents this interface.

08:05.980 --> 08:07.750
So we have our preprocessor trick.

08:08.470 --> 08:12.850
If we are compiling this as C++, then we will have the "extern C" in front of it.

08:13.360 --> 08:18.280
And the C++ compiler will not mangle the name. So the symbol will be "add" in the binary.

08:19.300 --> 08:24.880
If this is not defined, then we are compiling in C, and the symbol will be called "add" anyway.

08:26.110 --> 08:32.740
So I am going to compile my C++ code to an object file. "minus c" means just produce the object file,

08:32.740 --> 08:34.510
and not the full program binary.

08:35.470 --> 08:38.680
So we now have add.o, which is a C++ binary.

08:40.680 --> 08:45.490
And there I compiled my C++ object file with a C source file.

08:45.900 --> 08:46.940
So we now have a program.

08:47.560 --> 08:48.420
Let's see if it works.

08:49.170 --> 08:50.010
And yes, it does!

08:52.870 --> 08:58.120
Something that is useful to know about is converting C++ library containers into arrays. So you can pass

08:58.120 --> 09:01.720
them to C functions. With string and vector

09:01.750 --> 09:06.340
have a data() member function, which we have mentioned before, and this will return the container's

09:06.340 --> 09:10.270
internal memory buffer as a pointer. And then you can use that as an array.

09:11.560 --> 09:13.690
So this works for strings and vectors.

09:14.350 --> 09:15.940
So here is an example with vectors.

09:15.940 --> 09:22.600
This time we have a C++ program which is going to create a vector. And then it is going to use a C function

09:22.600 --> 09:24.760
for printing out the elements of this vector.

09:26.960 --> 09:28.130
There is our C function.

09:28.160 --> 09:33.320
It will take the array as a pointer, but it also needs the number of elements. Because C arrays are the same

09:33.320 --> 09:34.880
as built-in arrays in C++.

09:35.270 --> 09:37.190
They do not know how many elements they have.

09:37.670 --> 09:38.690
So we have to tell them.

09:40.770 --> 09:44.400
And then in our array header. This time we are not using "extern C".

09:44.410 --> 09:46.380
So this is something that was provided by a C programmer.

09:46.470 --> 09:48.060
They do not know anything about "extern C".

09:48.690 --> 09:50.880
So in the main function, we have to do this for our selves.

09:51.210 --> 09:54.780
We have to say that this include file is going to be "extern C".

09:55.200 --> 09:59.160
So that will tell the C++ compiler not to mangle the function names.

10:00.690 --> 10:01.800
So I do the same thing again.

10:01.800 --> 10:04.860
I compile my c code to an object file.

10:07.800 --> 10:11.130
And then I compile it with the C++ source code.

10:14.570 --> 10:17.210
And there we are. For associative containers,

10:17.210 --> 10:21.650
the trick is to convert them to a vector, or vectors. And then you can use the same trick again,

10:21.800 --> 10:27.770
of calling the data() member in. With a set, for example, we can call the copy() algorithm and use

10:27.770 --> 10:33.290
that to populate a vector. And then we get the array from the vector. With a map,

10:33.290 --> 10:39.170
we need two separate vectors, one for the keys and one for the values. And this C interface function

10:39.170 --> 10:42.470
should have two separate arrays, one for keys and one for values.

10:43.190 --> 10:45.290
And I am going to leave you that as an assignment.

10:46.130 --> 10:46.430
Okay.

10:46.430 --> 10:47.480
So that is it for this video.

10:47.870 --> 10:48.650
I will see you next time.

10:48.650 --> 10:50.930
But until then, keep coding!
