0
1
00:00:01,570 --> 00:00:07,870
Both FPGAs and CPUs can be used to implement logical and computational functions. But what are 
1

2
00:00:07,870 --> 00:00:09,190
their differences? Here, 
2

3
00:00:09,310 --> 00:00:15,190
I am going to focus on answering this question and point out the main differences that help us to distinguish 
3

4
00:00:15,220 --> 00:00:17,740
between FPGA and CPU coding techniques.
4

5
00:00:21,570 --> 00:00:27,690
A CPU is a computing platform with a fixed hardware architecture controlled by a set of commands or 
5

6
00:00:27,690 --> 00:00:33,540
instructions. This fixed hardware architecture is generic enough to perform any logical or algorithmic
6

7
00:00:33,540 --> 00:00:40,830
function as long as it is compiled into the CPU instructions.
On the other hand, an FPGA is a computing 
7

8
00:00:40,830 --> 00:00:44,120
hardware platform with no predefined hardware architecture. 
8

9
00:00:45,440 --> 00:00:50,720
Similar to the CPU platform, it can host almost any logical and mathematical function. 
9

10
00:00:51,750 --> 00:00:58,080
However, the synthesis tool should translate a C/C++ program into a hardware architecture composed of
10

11
00:00:58,080 --> 00:00:59,450
a few hardware modules. 
11

12
00:01:00,060 --> 00:01:04,990
Therefore, each C/C++ program has its own dedicated optimised hardware architecture. 
12

13
00:01:05,580 --> 00:01:10,980
This dedicated architecture leads to high-performance execution for some algorithms on the FPGA.
13

14
00:01:14,010 --> 00:01:20,640
The main difference between CPU and FPGA in running an algorithm is how statements are executed. 
14

15
00:01:20,640 --> 00:01:27,210
A typical CPU compiler translates a C function into an assembly language or machine code. Then CPU
15

16
00:01:27,210 --> 00:01:31,050
fetches these codes from memory and executes them sequentially. 
16

17
00:01:31,860 --> 00:01:37,440
Therefore, sequential execution is an intrinsic feature of a CPU. For adding parallelism to the CPU
17

18
00:01:37,440 --> 00:01:38,130
execution, 
18

19
00:01:38,130 --> 00:01:43,980
the corresponding architecture should be modified. In contrast, an FPGA synthesis tool translates 
19

20
00:01:43,990 --> 00:01:48,330
a C function into a set of connected hardware blocks that exchange data. 
20

21
00:01:49,340 --> 00:01:53,730
These hardware modules will be activated as soon as they receive their required data. 
21

22
00:01:54,530 --> 00:01:59,210
Therefore they can perform operators in parallel. In other words, 
22

23
00:02:00,100 --> 00:02:02,630
parallelism is an intrinsic feature of FPGAs. 
23

24
00:02:03,710 --> 00:02:09,770
The concurrent statements in a C code, which have no data, resource or control dependencies among them, 
24

25
00:02:10,190 --> 00:02:12,140
can be run in parallel on FPGA. 
25

26
00:02:12,950 --> 00:02:18,380
Therefore, the concepts of concurrency and dependency among statements are very important in coding 
26

27
00:02:18,380 --> 00:02:21,350
an algorithm for FPGA. Along this course,
27

28
00:02:21,650 --> 00:02:27,560
I will explain how to detect the statements concurrency and describe them in the C code using specific 
28

29
00:02:27,560 --> 00:02:29,900
coding styles and compiler directives.
29

30
00:02:31,250 --> 00:02:37,890
Let’s consider this very simple computing architecture as a CPU that can perform two instructions: reset 
30

31
00:02:37,910 --> 00:02:39,990
a variable and adding two numbers.
31

32
00:02:40,250 --> 00:02:47,540
It consists of a memory that keeps data, an Arithmetic Logic Unit (ALU) which performs addition, 
32

33
00:02:48,050 --> 00:02:53,350
a register to keep the result and finally fixed connections between these elements. 
33

34
00:02:54,050 --> 00:02:56,290
The memory has two lines to address a memory cell. 
34

35
00:02:56,300 --> 00:03:00,680
The register has two signals: reset and load. 
35

36
00:03:00,980 --> 00:03:05,990
If we put the logic value 1 on the reset signal, the register value will be 0. 
36

37
00:03:07,680 --> 00:03:11,370
Let’s assume that the CPU is going to run this simple C code. 
37

38
00:03:12,310 --> 00:03:16,990
For this purpose, the code should be translated into the assembly or machine code.
38

39
00:03:18,320 --> 00:03:24,140
The machine code represents the sequence of the hardware structure signals to perform the original C code.
39

40
00:03:25,480 --> 00:03:28,420
Let’s follow the CPU execution steps. Here, 
40

41
00:03:28,510 --> 00:03:34,390
I have ignored the instruction fetch for the sake of simplicity. In the first step, the register that keeps 
41

42
00:03:34,390 --> 00:03:40,840
the answer is set to zero. In the next step, the first data in the memory (that is c1) is added to 
42

43
00:03:40,840 --> 00:03:43,660
register a. In the third and fourth steps,
43

44
00:03:43,840 --> 00:03:48,040
the c2 and c3 values are accumulated to the a resister. 
44

45
00:03:49,830 --> 00:03:57,240
Now let’s study the execution of the same C code on an FPGA. To implement the C code on an FPGA, the 
45

46
00:03:57,240 --> 00:04:00,620
synthesis tool should translate that into a hardware circuit.  
46

47
00:04:01,020 --> 00:04:04,820
This figure shows such a circuit, which consists of two adders. 
47

48
00:04:05,430 --> 00:04:07,380
There are two steps to find the result.
48

49
00:04:08,900 --> 00:04:14,750
In the first step, two c1 and c2 values are added, and in the following step, the result of 
49

50
00:04:14,750 --> 00:04:18,410
the addition is added to the c3 value to get the final answer. 
50

51
00:04:21,480 --> 00:04:27,930
Now, after understanding the execution mechanisms in CPU and FPGA platforms, do we need to learn the 
51

52
00:04:27,930 --> 00:04:32,370
internal structure of an FPGA to implement a circuit or algorithm in HLS?
52

53
00:04:33,400 --> 00:04:39,010
I will cope with this question in the next lecture. But for a short discussion: the internal structure 
53

54
00:04:39,010 --> 00:04:43,360
of an FPGA is invisible to the users, so design tools take care of them. 
54

55
00:04:43,570 --> 00:04:49,150
However, having a basic knowledge of the internal structure and different resources inside an FPGA 
55

56
00:04:49,300 --> 00:04:54,910
helps designers to understand better the tools reports in order to provide an efficient design.
56

57
00:04:56,150 --> 00:04:57,620
These are the takeaway messages:
57

58
00:04:58,610 --> 00:05:05,120
Whereas CPUs typically execute instructions sequentially, FPGAs run then in parallel
58

59
00:05:05,120 --> 00:05:08,720
Understanding the instruction dependency is important in FPGA to exploit parallelism. 
59

60
00:05:12,280 --> 00:05:16,750
Now the quiz question. This code shows a simple computation program. 
60

61
00:05:17,870 --> 00:05:24,590
The hardware structures for the CPU and FPGA platforms are shown in these figures. The CPU computing structure
61

62
00:05:24,590 --> 00:05:26,930
consists of an ALU and a register.
62

63
00:05:27,170 --> 00:05:33,720
The FPGA computing structure connects three adders consecutively to perform the task. Show the execution 
63

64
00:05:33,720 --> 00:05:36,500
steps of the code on the CPU and on the FPGA.
