WEBVTT

00:01.350 --> 00:05.160
Hello again! In this video, we are going to look at streams and buffering.

00:08.260 --> 00:11.950
We have already mentioned that C++ streams use buffering.

00:12.460 --> 00:16.450
The purpose of this is to reduce the number of calls to the operating system.

00:17.260 --> 00:22.360
When our program calls the operating system, it has to stop and wait. And then for input and output,

00:22.360 --> 00:27.190
the operating system has to stop and wait for the disk to do whatever it has to do.

00:29.020 --> 00:35.320
So, for example, during a write operation, the stream will store the data in a memory buffer.

00:36.130 --> 00:42.490
The size of this buffer is equal to the maximum amount of data that the operating system will accept

00:42.490 --> 00:43.570
in a single call.

00:44.860 --> 00:50.410
When the buffer is full, that means we have the optimal amount of data. And then the stream will remove

00:50.410 --> 00:53.830
the data from the buffer and send it off to the operating system.

00:54.760 --> 00:56.850
And that process is known as "flushing"

00:56.860 --> 01:01.870
the buffer. "Flushing" basically means getting rid of something, so it is now somebody else's problem.

01:05.000 --> 01:06.650
When does this take place?

01:07.010 --> 01:12.140
It depends on the type of the stream. For ostream, like cout,

01:12.950 --> 01:18.110
it depends on the operating system, and also on how the user has set their terminal up.

01:19.520 --> 01:23.150
The usual arrangement is to have what is called "line oriented" output.

01:23.690 --> 01:29.570
So every time the terminal sees a new line, it will print all the data it has. If you only give it

01:29.570 --> 01:32.570
part of a line, it will wait until it gets the new line character.

01:35.590 --> 01:42.730
Also, C++ requires that if the program is going to read from cin, it must first flush cout. So that

01:42.730 --> 01:48.310
makes sure that the user can see all the output that the program has generated, before they are required

01:48.310 --> 01:49.440
to respond to it.

01:51.790 --> 01:58.900
For output file streams, the buffer is only flushed when it is full. And for input streams, there is,

01:58.930 --> 02:01.540
surprisingly, no way to actually flush them directly.

02:02.080 --> 02:05.530
If you want to remove data from a stream, you have to read it.

02:10.130 --> 02:15.050
And that is normally good enough for most programs. We do not really care too much when the output appears

02:15.050 --> 02:18.710
on the screen or gets written to the disk, so long as it is there when we need it.

02:19.880 --> 02:25.730
But there are some times when we do need to have control over it. The library provides std flush

02:25.730 --> 02:26.150
for that.

02:26.840 --> 02:28.420
This is not a function call.

02:28.430 --> 02:32.150
It is actually something called a manipulator. A stream manipulator.

02:32.900 --> 02:39.680
So if we have some stream operation, for example, we are sending the variable i to cout, and then we put

02:39.680 --> 02:40.430
flush after

02:40.430 --> 02:44.960
that. Then that will cause the cout buffer to be flushed.

02:45.440 --> 02:48.350
And that will cause the value of i to appear on the screen immediately.

02:51.110 --> 02:56.660
So this means that the output on the screen, or the data in the file, always matches exactly what the

02:56.660 --> 02:58.910
program thinks it has sent off.

03:00.230 --> 03:04.940
On the other hand, this does have a big impact on performance, because we are doing a lot more hardware

03:04.940 --> 03:05.600
operations.

03:07.010 --> 03:10.400
So you should only use flush if the data really needs to be up to date.

03:11.450 --> 03:16.520
One example of that is if you have a program that keeps crashing, and you decide to create a log file.

03:16.520 --> 03:19.280
So you can find out what the program was doing when it crashed.

03:20.210 --> 03:25.370
So every time the program does something, it will write an entry in this file saying what it is doing,

03:25.670 --> 03:27.380
what the value of the data is, and so on.

03:28.040 --> 03:32.660
And then, when the program crashes, you can look at this file and try to work out what the problem is.

03:33.840 --> 03:39.180
If you let the stream decide when the buffer is flushed, it is possible that the program could crash

03:39.450 --> 03:41.580
when there is still a lot of data in the buffet.

03:42.120 --> 03:45.140
So the actual data in the file could be quite a bit out of date.

03:45.150 --> 03:48.300
It could be several operations before the one that causes the problem.

03:49.590 --> 03:53.790
So what you want is to make sure that the data in the file exactly matches what the program is doing.

03:54.210 --> 03:55.770
And for that, you would need to call flush.

03:56.130 --> 04:01.860
So every time the program writes to the file, the file will contain the data about what it is doing.

04:02.430 --> 04:04.350
It does not get delayed or held back.

04:08.340 --> 04:10.710
So let's look at a simple example of this.

04:11.220 --> 04:17.970
We have a loop. And on each loop iteration, we are going to display the value of the loop counter and

04:17.970 --> 04:19.410
also write it to a file.

04:19.920 --> 04:21.810
So we've done the usual thing.

04:21.810 --> 04:26.040
We create an ofstream object and check that is valid.

04:28.590 --> 04:31.350
And then partway through the loop, we kill the program.

04:31.650 --> 04:32.640
So what will happen?

04:34.150 --> 04:41.380
cout is presumably line buffered. So this should display the correct value. Every time we send this

04:41.380 --> 04:46.630
value to cout, it will appear on the display. The output file stream is not line buffered.

04:47.380 --> 04:52.240
So this is going to be the last value from when the buffer was flushed. And that may or may not contain

04:52.240 --> 04:54.340
the right value for when the program stopped.

04:55.330 --> 04:56.260
So let's try this out.

05:01.100 --> 05:03.690
And this takes a long time to write.

05:04.590 --> 05:08.970
We need to make sure that this number is greater than the maximum possible buffer size.

05:09.960 --> 05:15.390
I think it is normally 4000 bytes on Unix, but I am not sure how big it is on Windows. But this is big

05:15.390 --> 05:16.050
enough anyway.

05:16.680 --> 05:19.860
So there we are, the program is now terminated.

05:20.340 --> 05:23.880
And on the display, we see 66,666.

05:24.300 --> 05:25.620
So that is the correct value.

05:26.100 --> 05:28.650
So the display does get updated every time we send

05:29.400 --> 05:30.720
i followed by a new line.

05:35.170 --> 05:43.930
As for the log file - well, we do not get the correct value. It got as far as 66021,

05:44.380 --> 05:49.900
and then that must be where the buffer was flushed. And then the rest of that number and all the following

05:49.900 --> 05:53.950
numbers were presumably in a buffer, and have now been lost forever.

05:58.890 --> 06:01.890
If I now put in a flush afterwards...

06:07.920 --> 06:11.640
So the terminal is going to display the value of i, followed by a new line every time we send it,

06:11.640 --> 06:12.810
but it was doing that anyway.

06:13.350 --> 06:19.840
And this means that the file stream's buffer is going to be flushed every time we send i and a new line

06:19.840 --> 06:20.010
to it.

06:20.760 --> 06:22.350
So let's see what difference that makes...

06:25.130 --> 06:26.360
And let's jump ahead a bit!

06:29.170 --> 06:30.030
So there we are.

06:30.070 --> 06:30.400
It is...

06:33.350 --> 06:36.200
It actually takes longer to run because of all these file operations.

06:37.220 --> 06:42.650
So there it is, we get 66,666 again. So that has made no difference to the

06:42.650 --> 06:43.160
terminal.

06:48.290 --> 06:52.490
And the log file now goes all the way down to 66,666.

06:53.030 --> 06:56.780
So this log file is being updated on every iteration through the loop.

06:58.340 --> 07:02.270
There is another manipulator you might have seen before, called endl.

07:02.870 --> 07:06.290
And this is equivalent to doing new line, followed by flush.

07:08.440 --> 07:10.450
So this should give the same results.

07:15.390 --> 07:18.210
And there we are. The same results for the terminal.

07:20.210 --> 07:22.370
And the same results in the log file.

07:23.960 --> 07:25.990
Okay, so that's it for this video.

07:26.360 --> 07:27.230
I'll see you next time.

07:27.230 --> 07:29.270
But meanwhile, keep coding!
