0
1
00:00:02,020 --> 00:00:07,330
Iterative programming structures such as for-loops are the most valuable techniques in developing 
1

2
00:00:07,340 --> 00:00:09,520
repetitive parametric hardware structures.
2

3
00:00:10,180 --> 00:00:13,240
What is the structure of a repetitive combinational logic circuit?
3

4
00:00:14,350 --> 00:00:20,050
In this lecture, I am going to give you a big picture of a repetitive hardware structure to be used later 
4

5
00:00:20,050 --> 00:00:20,950
throughout this section.
5

6
00:00:21,920 --> 00:00:26,150
Software loop statements represent the repetitive patterns in an algorithm.
6

7
00:00:27,230 --> 00:00:33,290
As we consider only combinational circuits in this course, we should understand how these types of circuits 
7

8
00:00:33,410 --> 00:00:36,710
can run a piece of code described by a software loop statement.
8

9
00:00:38,350 --> 00:00:44,950
For the sake of simplicity, I’ll consider for-loop statements. However, some examples 
9

10
00:00:44,980 --> 00:00:49,270
that use while-loop statements will be explained later during the course.
10

11
00:00:51,580 --> 00:00:58,570
Let’s have a look at a for-loop statement in C language. According to the loop index and its condition, 
11

12
00:00:59,200 --> 00:01:07,510
this for-loop has three iterations, each of which executes two tasks named task1 and task2.
12

13
00:01:07,510 --> 00:01:07,870
The software 
13

14
00:01:07,870 --> 00:01:13,390
execution of this loop can be shown by a flow-chart graph or a control-data flow graph. 
14

15
00:01:14,840 --> 00:01:18,320
First, the loop index gets 0 as its initial value. 
15

16
00:01:19,390 --> 00:01:25,780
Then as it is less than 3, the loop body will be executed. After executing the tasks, the loop
16

17
00:01:25,810 --> 00:01:31,990
index is incremented, and as it is still less than 3, the second iteration of the loop is executed.
17

18
00:01:33,020 --> 00:01:35,360
And then the third iteration is performed.  
18

19
00:01:36,950 --> 00:01:43,060
After the third iteration, the loop index would be 3, which does not satisfy the for-loop condition; 
19

20
00:01:43,670 --> 00:01:49,240
therefore, the execution control passes to the next statement after the loop in the program.
20

21
00:01:51,260 --> 00:01:57,830
This behaviour shows the sequential execution of the loop body on a CPU. Notice that, there is one instance of 
21

22
00:01:57,830 --> 00:02:00,350
each task invoking several times.
22

23
00:02:00,900 --> 00:02:03,410
Now let’s have a look at the hardware implementation.
23

24
00:02:04,040 --> 00:02:09,530
The execution of the corresponding combinational hardware is a little bit different. Based on the dependency 
24

25
00:02:09,530 --> 00:02:10,430
among tasks, 
25

26
00:02:10,820 --> 00:02:12,400
there can be different scenarios. 
26

27
00:02:12,860 --> 00:02:17,990
First of all, the combinational circuit of this loop should instantiate separate hardware for each 
27

28
00:02:17,990 --> 00:02:19,520
task in the loop iterations. 
28

29
00:02:20,000 --> 00:02:26,780
That means, as the loop has three iterations, and each iteration has two tasks, the hardware implementation 
29

30
00:02:26,870 --> 00:02:30,860
should have 6 hardware modules implementing the tasks individually. 
30

31
00:02:31,180 --> 00:02:35,800
In the first scenario, all consecutive tasks are dependent on each other results, 
31

32
00:02:36,320 --> 00:02:39,940
that means they need the results generated by their previous one. 
32

33
00:02:40,430 --> 00:02:47,050
In this case, as shown in this figure, sequential task execution is expected. In the second scenario, 
33

34
00:02:47,210 --> 00:02:53,960
the tasks in an iteration are dependent, but the tasks in different iterations are independent as they
34

35
00:02:53,960 --> 00:02:55,430
receive different parts of data.
35

36
00:02:56,000 --> 00:03:02,390
Therefore, all task1 from different iterations can be executed in parallel and all task2
36

37
00:03:02,390 --> 00:03:05,360
instances are performed afterwards. 
37

38
00:03:05,960 --> 00:03:11,960
In the third scenario, we assume all tasks are independent as they may work on different data and don’t 
38

39
00:03:11,960 --> 00:03:13,810
use the other tasks results.
39

40
00:03:14,690 --> 00:03:17,530
In this case, all tasks run in parallel.  
40

41
00:03:18,050 --> 00:03:23,690
Please don’t be scared of the little bit complexity in hardware execution flow, as most of them will 
41

42
00:03:23,690 --> 00:03:26,090
be managed by the HLS tools automatically. 
42

43
00:03:27,790 --> 00:03:34,040
Let’s consider a real example. An N-bit binary adder is a combinational hardware circuit with
43

44
00:03:34,040 --> 00:03:40,870
a repetitive computation pattern. It uses N full-adders executed sequentially in order to add 
44

45
00:03:40,870 --> 00:03:46,630
two N-bit binary numbers. As each full-adder should use the carry-bit generated by its predecessor, 
45

46
00:03:46,840 --> 00:03:48,850
then the execution is sequential.
46

47
00:03:49,940 --> 00:03:55,910
Another example of a combinational circuit with a repetitive pattern is one-counter which count 
47

48
00:03:55,910 --> 00:04:01,880
the number of 1s in a binary representation of a given number.  This circuit can use N-adders executed 
48

49
00:04:01,880 --> 00:04:04,350
sequentially, to find the number of 1s. 
49

50
00:04:04,880 --> 00:04:10,840
In addition, there is another hardware implementation of this design that uses logN level of adders, 
50

51
00:04:10,880 --> 00:04:12,980
connected through a binary tree structure. 
51

52
00:04:13,760 --> 00:04:19,230
This implementation is preferable because it is faster than the previous implementation. In the rest of this section, 
52

53
00:04:19,280 --> 00:04:25,880
I will explain how to describe an algorithm to guide the HLS tool utilising these types of optimisations.
53

54
00:04:27,670 --> 00:04:33,760
In the next step, I will explain how to describe a combinational loop in Xilinx Vivado-HLS environment. 
54

55
00:04:35,600 --> 00:04:37,900
These are the takeaway messages of this lecture. 
55

56
00:04:39,600 --> 00:04:44,730
Using software loop statements is necessary for describing combinational circuits in HLS
56

57
00:04:45,270 --> 00:04:52,200
In hardware designs, Statements in a loop iteration can execute in parallel (intra-iteration parallelism), Statements among loop iterations 
57

58
00:04:52,200 --> 00:04:53,700
can also execute in parallel (iter-iteration parallelism)
58

59
00:04:56,710 --> 00:05:03,380
Now the first quiz: Let’s consider the following for-loop. Then draw an execution graph for a combinational 
59

60
00:05:03,380 --> 00:05:05,320
circuit that performs this loop.
60

61
00:05:06,870 --> 00:05:10,650
As the second quiz perform the same task on this for-loop.
