1
00:00:00,810 --> 00:00:03,930
And you're in this overview page, so you have different pages.

2
00:00:03,930 --> 00:00:07,140
In the overview page, we have this performance summary.

3
00:00:07,140 --> 00:00:13,620
We're given the average step time, all of the time combination, compilation time output, time input

4
00:00:13,650 --> 00:00:13,950
time.

5
00:00:13,960 --> 00:00:20,520
Notice how this input time is relatively larger than the others and you'll see that all you'll note

6
00:00:20,520 --> 00:00:26,430
is that your the input time occupies the largest part of the step time.

7
00:00:26,430 --> 00:00:36,300
So that's why you have a recommendation to first focus on reducing this input time.

8
00:00:36,780 --> 00:00:42,870
And then we have 75 7.5% of total step time sample is spent in kernel launch.

9
00:00:42,870 --> 00:00:46,770
It could be due to CPU contention with TF data.

10
00:00:46,770 --> 00:00:53,370
In this case, you may try to set the environment variable to GPU trend mode to GPU private.

11
00:00:53,500 --> 00:00:59,400
Then we have 6.6% of the total step time sample is spent on all of the time.

12
00:00:59,400 --> 00:01:02,010
This could be due to Python execution overhead.

13
00:01:02,010 --> 00:01:09,720
Only 0% of device computation is 16 bit, so you might want to replace more 32 bit operations by 16

14
00:01:09,720 --> 00:01:11,820
bit operations to improve performance.

15
00:01:12,360 --> 00:01:17,060
So this actually means we could make use of mix precision training.

16
00:01:17,070 --> 00:01:20,310
Now we're going to look at expression training in subsequent sections.

17
00:01:20,790 --> 00:01:27,420
We also have this audit tools we could use for reducing the input time, this input pipeline analyzer.

18
00:01:27,570 --> 00:01:30,750
You could click here, you have this input pipeline analyzer.

19
00:01:31,380 --> 00:01:35,550
Now notice that you have the different tools here, so we have the overview page.

20
00:01:36,000 --> 00:01:37,830
Let's go back to overview page.

21
00:01:39,160 --> 00:01:42,840
And then here we have this input pipeline analyzer.

22
00:01:43,810 --> 00:01:46,750
Let's scroll down here we have the input pattern analyzer.

23
00:01:46,750 --> 00:01:50,740
We have this TGF beta bottleneck analysis.

24
00:01:50,950 --> 00:01:52,060
You could click on this.

25
00:01:52,060 --> 00:01:58,540
You see we have this tab button and analysis and then you let's get back to the overview again.

26
00:01:58,660 --> 00:02:03,760
Scroll up and then you have this trace viewer right here.

27
00:02:03,760 --> 00:02:09,820
So you will also find this trace viewer in your scroll here and you have the trace view.

28
00:02:10,180 --> 00:02:13,780
Now get into summary of the input pipeline analysis.

29
00:02:14,470 --> 00:02:20,200
We get to see exactly the breakdown of the input processing time on the host, because we've seen already

30
00:02:20,200 --> 00:02:28,810
from here that the input processing time is kind of ticking up close to 68.4% of the total step time.

31
00:02:28,810 --> 00:02:38,140
So here you could see we have the data processing, which is like the main reason why this input processing

32
00:02:38,170 --> 00:02:42,160
time is that large and we have this different steps we could take.

33
00:02:42,160 --> 00:02:47,740
So what can be done to reduce both components of the host input time encoding data?

34
00:02:49,060 --> 00:02:53,800
That is, you may want to combine small input data chunks into fewer larger chunks.

35
00:02:53,860 --> 00:02:55,510
Data processing.

36
00:02:55,540 --> 00:03:03,430
You may increase number of power calls in the dataset map or process the data offline reading data from

37
00:03:03,430 --> 00:03:08,850
files in advance, reading data from files on demand or the data reading or processing.

38
00:03:08,890 --> 00:03:13,030
And then here we have a more detailed input operation statistics.

39
00:03:13,030 --> 00:03:19,420
So let's click on this scroll up and then you could have this statistics given to you right here.

40
00:03:19,420 --> 00:03:26,680
So as you could see, we know exactly why our input processing is taking up much time.

41
00:03:26,680 --> 00:03:32,100
And based on this different suggestions, we could reduce this time.

42
00:03:32,110 --> 00:03:35,170
Now from here we have the channel stats.

43
00:03:35,980 --> 00:03:45,250
Then we also have this memory profile and then we have the part viewer from the spot viewer.

44
00:03:45,250 --> 00:03:47,740
We have the TensorFlow stats.

45
00:03:49,230 --> 00:03:54,750
And then here we have this TensorFlow data bottleneck analysis.

46
00:03:55,410 --> 00:03:57,770
You could also look at this from here.

47
00:03:57,780 --> 00:03:59,710
You see we have the root pre fetch.

48
00:03:59,730 --> 00:04:01,500
Look at the cell duration here.

49
00:04:01,500 --> 00:04:04,620
Ten microseconds, 36 microseconds.

50
00:04:04,650 --> 00:04:06,840
See, this one now is very large.

51
00:04:06,840 --> 00:04:08,700
So here's our bottleneck.

52
00:04:08,850 --> 00:04:18,450
The level of the mapping and batching shuffling just 118 37, three pre fetching and so on and so forth.

53
00:04:18,480 --> 00:04:25,410
So we now know that the problem comes from the mapping as we are seeing it, as we've seen previously.

54
00:04:25,410 --> 00:04:33,600
And now selecting this trace viewer we have this year and we will make use of this year.

55
00:04:33,600 --> 00:04:36,930
So we could carry this around by clicking on this.

56
00:04:36,930 --> 00:04:42,930
And then you have the arrow, you have this to like pull this from place to place.

57
00:04:42,930 --> 00:04:44,250
You have the zoom.

58
00:04:44,250 --> 00:04:49,500
So you could you have the pan, you have the zoom, you have this timing.

59
00:04:49,500 --> 00:04:55,710
So let's click on the zoom and then you see, see click, you click and you drag to the top.

60
00:04:55,710 --> 00:04:58,120
You could zoom in and zoom out.

61
00:04:58,140 --> 00:04:59,880
Now you notice this strain.

62
00:04:59,880 --> 00:05:02,360
We have values from 100.

63
00:05:02,370 --> 00:05:06,510
Like, let's get back to what we had defined previously.

64
00:05:06,930 --> 00:05:09,240
Here we have 100, 232.

65
00:05:09,240 --> 00:05:16,110
And that's why you notice here we have this train from 100 to 182 to getting back here.

66
00:05:16,110 --> 00:05:17,340
We can zoom.

67
00:05:17,670 --> 00:05:19,770
See you have that.

68
00:05:19,770 --> 00:05:24,980
And then you could click here on the pan and then you pull this to one side.

69
00:05:24,990 --> 00:05:25,800
Now stop.

70
00:05:25,800 --> 00:05:32,400
And right here you could zoom again and then you get to see all those different operations carried out

71
00:05:32,400 --> 00:05:38,250
during a single process, all these different operations carried out during a single training step.

72
00:05:38,910 --> 00:05:45,630
And with this time and tool, you click on this time and tool, you can be able to like, let's zoom

73
00:05:45,630 --> 00:05:53,850
this again and zoom and then drag this here, zoom again.

74
00:05:56,590 --> 00:06:00,190
As we're saying, measure the timing for a given operation.

75
00:06:00,190 --> 00:06:06,700
So once you click here, you just click and then you see you drag and you can measure timing for different

76
00:06:06,700 --> 00:06:07,300
operations.

77
00:06:07,300 --> 00:06:13,810
You see you have the time, you're 166.5 microseconds and that's it.

78
00:06:13,810 --> 00:06:16,420
We now go on the distributions here.

79
00:06:16,420 --> 00:06:18,910
We have batch normalization.

80
00:06:18,910 --> 00:06:20,350
Batch normalization.

81
00:06:20,350 --> 00:06:28,000
Let's check out on this, come to the you see that different biases and weights that you see the kernel

82
00:06:28,000 --> 00:06:28,690
years.

83
00:06:28,690 --> 00:06:38,080
The weights have values which fall under this zone, the values fall under or the values are between

84
00:06:38,080 --> 00:06:41,950
0.30 -0.3 0.34.

85
00:06:41,950 --> 00:06:48,940
The biases between -1.2 about 0.4.

86
00:06:48,940 --> 00:06:53,320
And then for this dance, we have this sort of dance layer.

87
00:06:53,320 --> 00:06:58,450
Here we have this convert to the two, which you could see here instead of the range of values.

88
00:06:58,450 --> 00:07:05,140
And then we have the dense one, we have the dense two, We also have the different value ranges for

89
00:07:05,140 --> 00:07:07,510
both the canals and the biases.

90
00:07:07,510 --> 00:07:11,500
So that's it for this distributions, we have the histograms too.

91
00:07:11,530 --> 00:07:20,860
You could check this out time series and the way this Time series is kind of similar to information

92
00:07:20,860 --> 00:07:21,760
we've seen already.

93
00:07:21,760 --> 00:07:25,270
So here we have this all we could select justice killers.

94
00:07:25,270 --> 00:07:32,230
So here we have the epoch loss, we have the different evaluation accuracy and so on and so forth.

95
00:07:32,260 --> 00:07:34,180
Now click on images.

96
00:07:34,180 --> 00:07:38,020
You select only images not to be shown.

97
00:07:38,050 --> 00:07:40,180
You see, that's why you have no information here.

98
00:07:40,240 --> 00:07:44,170
Click on histogram and then you have the histogram data we've just seen already.

99
00:07:44,290 --> 00:07:45,750
That's it for this section on ten.

100
00:07:45,760 --> 00:07:50,500
So bar ten supporters, other functionalities which we shall explore subsequently.

101
00:07:50,500 --> 00:07:53,650
And thank you for getting right up to this point.