1
00:00:00,900 --> 00:00:02,100
Hi and welcome back.

2
00:00:02,220 --> 00:00:08,100
So in this section, we'll take a look at Alex Net, which was another advance CNN advanced for the

3
00:00:08,100 --> 00:00:15,510
time, which was in 2012, and it was basically another classical CNN architecture.

4
00:00:15,960 --> 00:00:17,940
And there's something I want to introduce here.

5
00:00:18,120 --> 00:00:20,840
Well, firstly, others tell me who this guy is behind us.

6
00:00:20,850 --> 00:00:23,670
It was Jeffrey Hinton's team from the University of Toronto.

7
00:00:24,090 --> 00:00:30,450
The team called supervision, and the reason I have this graphic here, this image is because it was

8
00:00:30,450 --> 00:00:32,430
the SVR winner.

9
00:00:32,460 --> 00:00:35,640
That's the image net competition on this dataset.

10
00:00:35,640 --> 00:00:38,230
Back then, it was a winner in 2012.

11
00:00:38,280 --> 00:00:41,490
And that's how these guys and this network got famous.

12
00:00:42,150 --> 00:00:46,450
Now I should I should mention that Alex is named after Alex Kristof Ski.

13
00:00:46,500 --> 00:00:52,500
I hope I'm doing this name just as probably butchering the pronunciation as well as Ilya suits give.

14
00:00:53,250 --> 00:00:53,770
Definitely.

15
00:00:53,790 --> 00:00:56,340
But should that one, to be fair, but that's OK.

16
00:00:56,550 --> 00:00:57,590
And Alex, not.

17
00:00:57,600 --> 00:01:00,390
I'll talk about the architecture, but but I want to mention something else.

18
00:01:00,870 --> 00:01:04,500
You would have noticed that Linnet came out in 1995.

19
00:01:04,990 --> 00:01:14,870
Alex, not the next big CNN took like 15 years or 17 years before of CNN's got like back into in the

20
00:01:14,940 --> 00:01:15,480
spotlight.

21
00:01:16,140 --> 00:01:16,970
So why?

22
00:01:17,270 --> 00:01:17,600
What?

23
00:01:17,640 --> 00:01:18,690
What did what happened?

24
00:01:18,780 --> 00:01:26,460
Well, remember A.J. Winters generally times and industry where there's a lot of hype and then a lot

25
00:01:26,460 --> 00:01:27,510
of crap and a crashing?

26
00:01:28,320 --> 00:01:31,860
Well, I went to between 2000 and 2010.

27
00:01:31,860 --> 00:01:33,240
Roughly, it wasn't really a winter.

28
00:01:33,250 --> 00:01:36,990
It was just a time when things sort of stagnated somewhat.

29
00:01:37,830 --> 00:01:40,620
We weren't getting anywhere with neural nets at a time.

30
00:01:41,250 --> 00:01:46,650
And the reason for that is we just didn't have the GPU computational power to really do a lot of experimentation

31
00:01:46,650 --> 00:01:47,070
with them.

32
00:01:47,280 --> 00:01:52,260
So a lot of researchers thought, yeah, neural nets are quite novel and quite good.

33
00:01:52,590 --> 00:01:57,450
However, they're not practical in the real world because they're just too slow to train and require

34
00:01:57,450 --> 00:01:58,170
too much data.

35
00:01:58,170 --> 00:02:00,000
So it's stagnated, said the research back then.

36
00:02:00,540 --> 00:02:07,810
However, things started to get a kick when GP use became a lot cheaper, a lot more powerful and and

37
00:02:08,050 --> 00:02:10,140
a lot more libraries became accessible.

38
00:02:10,500 --> 00:02:16,740
Python became more mainstream, and Python was so much easier to work with than C++ or Java when building

39
00:02:17,580 --> 00:02:23,040
neural nets because of the libraries that accumulate even early libraries like coffee and theno were

40
00:02:23,040 --> 00:02:24,480
actually quite easy to use.

41
00:02:24,630 --> 00:02:25,140
To be fair.

42
00:02:25,440 --> 00:02:32,670
So anyway, back to the architecture of Alex, and it contained eight layers, with the first five being

43
00:02:32,670 --> 00:02:35,880
convolutional Liz and the Last Tree being FC layers.

44
00:02:35,880 --> 00:02:37,500
So let's take a look at the diagram.

45
00:02:37,830 --> 00:02:44,070
But before we do, we should mention that this was a big CNN back back then 60 million parameters,

46
00:02:44,070 --> 00:02:46,900
which was quite large, in my opinion.

47
00:02:46,900 --> 00:02:49,530
That's still quite large, even for networks at Atrium.

48
00:02:50,040 --> 00:02:54,480
So it did take 2GB to use state of the art GPUs back then.

49
00:02:54,990 --> 00:03:00,990
Probably some in video quadruples most likely took over a week to train, so it can see it for quite

50
00:03:00,990 --> 00:03:01,380
some time.

51
00:03:02,700 --> 00:03:05,220
And this is an illustration of the architecture.

52
00:03:05,430 --> 00:03:09,210
It's not the best illustration it Kim Go got distracted from the people.

53
00:03:09,600 --> 00:03:14,970
However, for some reason, the authors cropped half of the image, and it looks a bit weird, in my

54
00:03:14,970 --> 00:03:15,390
opinion.

55
00:03:15,870 --> 00:03:18,990
But you can kind of see the CNN layers here.

56
00:03:18,990 --> 00:03:24,870
There's a scene and live here on CNN, live here, CNN live here and then here, I believe.

57
00:03:24,900 --> 00:03:25,230
Yeah.

58
00:03:26,370 --> 00:03:28,980
And then there's the FC layers, the three of them here.

59
00:03:29,400 --> 00:03:31,860
So that's how we get this, this architecture.

60
00:03:32,010 --> 00:03:33,720
So it's not a very complicated design.

61
00:03:34,230 --> 00:03:38,010
Even though it has five layers, it's still a relatively simple CNN.

62
00:03:39,300 --> 00:03:45,150
So now let's take a look at Viji, which is one of my favorite networks because it generally works so

63
00:03:45,150 --> 00:03:45,420
well.

64
00:03:45,960 --> 00:03:49,320
But it's a big network, so it does come with a cost.

65
00:03:50,010 --> 00:03:51,810
So I'll see you in the next section.

66
00:03:51,990 --> 00:03:52,440
Thank you.