1
00:00:00,180 --> 00:00:06,200
Hello, everyone, and welcome to this new and exciting session in which we shall dive into Image Generation.

2
00:00:06,210 --> 00:00:14,970
So as you could see right here, this image was AI generated, but back in 2014, 2015, 2016, images

3
00:00:14,970 --> 00:00:23,910
like this weren't yet to be generated using AI With advances in AI like the virtual auto encoders,

4
00:00:23,910 --> 00:00:33,060
the gains, and even more sophisticated games like the W Gans pro Gans as our gans and cycle gans AI

5
00:00:33,090 --> 00:00:37,050
algorithms have been trained to produce high quality images.

6
00:00:37,900 --> 00:00:42,130
And today we even get much better results with the diffusion models.

7
00:00:42,760 --> 00:00:51,460
That said, in this section, we shall treat the Variational auto encoders and the DC Gans.

8
00:00:51,880 --> 00:00:56,560
And so at the end of the section you should be able to produce images like this.

9
00:00:56,560 --> 00:01:04,120
Back in 2014, one of the best performing image generation models was this model you have right in front

10
00:01:04,120 --> 00:01:04,600
of you.

11
00:01:04,600 --> 00:01:07,240
That is the Variational auto encoder.

12
00:01:07,240 --> 00:01:10,750
And the way this model worked was quite simple.

13
00:01:11,200 --> 00:01:23,320
We had an encoder block which took in some input image and then generated this embeddings and now this

14
00:01:23,320 --> 00:01:30,070
embeddings have an encoded information about the inputs could be used by the decoder.

15
00:01:31,320 --> 00:01:34,620
To generate output images.

16
00:01:34,620 --> 00:01:43,890
Nonetheless, by 2014, Ion Goodfellow came up with this idea of the gan, and the gan signifies generative

17
00:01:43,920 --> 00:01:47,640
adversarial neural networks.

18
00:01:47,670 --> 00:01:58,980
So here we have two neural networks here the generator G and the discriminator, where this G and this

19
00:01:58,980 --> 00:02:09,450
RD that's the discriminator are both in some context where the generator is learning how to produce

20
00:02:09,450 --> 00:02:16,440
images which look like those from the real data set or the training set.

21
00:02:16,860 --> 00:02:25,410
And on the other hand, the discriminator is learning how to differentiate between real data like this

22
00:02:25,410 --> 00:02:28,800
one and fake data produced by the generator.

23
00:02:29,670 --> 00:02:35,940
If we consider the simple example here, you could see that we pass in some input noise.

24
00:02:35,940 --> 00:02:37,710
We get this output.

25
00:02:38,340 --> 00:02:46,920
And because this output doesn't look like the real data, the discriminator considers this as fake.

26
00:02:48,000 --> 00:02:56,820
Whereas now for this other example, the output from the generator looks like the real data.

27
00:02:56,820 --> 00:03:02,670
And so the discriminator sees this, or the discriminator is tricked by the generator to think that

28
00:03:02,670 --> 00:03:03,840
this is real data.

29
00:03:03,960 --> 00:03:10,290
So after updating the parameters of the generator and the discriminator such that we get to that point

30
00:03:10,290 --> 00:03:15,900
where the discriminator no longer knows the difference between what is coming from the generator and

31
00:03:15,900 --> 00:03:17,520
what is coming from the training set.

32
00:03:17,520 --> 00:03:27,270
We now have this generator block which is able to take in random noise and generate outputs which are

33
00:03:27,270 --> 00:03:30,930
similar to those from our training set.

34
00:03:30,930 --> 00:03:41,460
And although this architecture was groundbreaking in 2014, 2015, today we have more advanced and better

35
00:03:41,460 --> 00:03:43,830
models like the Style Gan.
