Last Updated on 21 April 2020
In the news recently: the website This Person Does Not Exist. This website does nothing else than showing you a face. When you refresh the website, a new fase is shown. And another one. And another one. And so on. It is perhaps a weird title for a website, but that’s intended. In the box to the right bottom of the website we read this: produced by a (…) network. Huh, someone real who doesn’t exist? …yes, seriously … every person you see on this website, does not exist.
In this tech blog, we dive into the deep to find out how this is possible. You will see that we will be covering a game of a machine learning technique known as a GAN – a generative adversarial network. We’ll look into the relatively short history of this way of thinking about machine learning. In doing so, we take a short side step towards game theory. Finally, we will look at the specific case of This Person Does Not Exist and the building blocks that together compose the machine learning aspects of the website.
Sounds difficult? Not too difficult, if you take some time to read this blog. And don’t worry, I will do my best to discuss GANs in layman’s terms. Please let me know in the comments whether I succeeded in that – I can only learn from your responses 🙂
Game theory and zero-sum games
It’s possible that you play a game in which only one reward can be shared over all participants. Playing chess and playing tennis are perfect examples of such a game: one person wins, which means that the other one loses. Or, in case of playing chess, it’s a tie. If you would note the scores for all players in those situations and subtract them from one another, you would get the following:
- 1-0: Player 1 (+1 win), player 2 (-1 win) = together 0 win;
- 0-1: Player 1 (-1 win), player 2 (+1 win) = together 0 win;
- Tie: Player 1 (½ win), player 2 (½ win) = together 0 win.
In all cases, those type of games yields a sum of zero with respect to the distribution of scores. It therefore won’t surprise you that such a game is called a zero-sum game. It’s one of the most important elements from a mathematical field known as game theory, because besides games like chess it can also be applied to more complex systems. Unfortunately, war, to give just one example, is often also a zero-sum game.
Generative adversarial networks
All right, let’s continue with the core topic of this blog: the website This Person Does Not Exist. We’ve seen what a zero-sum game is, but now we will have to apply it in the area of machine learning. The website was made by using a technique known as a generative adversarial network, also known as GAN. We’ll have to break that term into its distinct parts if we would like to know what it means:
- Generative: it makes something;
- Adversarial: something battles against each other in some kind of game;
- Network: two neural networks, in this case.
In short: a GAN is composed of two neural networks which, by playing against each other and trying to let each other lose, make something.
And ‘something’ is pretty varied these days. After modern applications of GANs emerged in 2014, networks have been developed which can produce pictures of the interior, of shoes, bags and clothing. But related networks are now also capable of playing videos ahead of time, which means: to upload 10 seconds of video, allowing the model to predict the next two. Another one: in 2017, a work was published in which the development of GANs that can make pictures older is discussed. Its application can be extended to missing children, who have possibly grown older but whose case was never resolved.
GANs are thus a new technique in the arsenal of a machine learning engineer which spawns a wide range of new applications. Not only predictive power, like with other models, but also some creative power!
But then, how does a GAN work exactly?
Schematically, you see its inner working next.
It all starts with what we call a noise vector. A vector is an abstract representation of what you can also consider to be some sort of list. In machine learning, data is converted into numbers in nine out of ten cases. A noise vector could therefore also be seen as a list of random numbers. The vector, or the list, is input to the first neural network, which is called the generator network. This generator is capable of converting a large amount of noise into a larger and more accurate picture, layer after layer. But it’s a fake one, though!
The fake picture is fed into the second neural network, which is also known as the discriminator. The network, which has been trained with real pictures, is capable of doing the opposite: breaking down the image in individual components to determine the category to which the picture belongs. In the case of a GAN, the categories are fake and real. In a way, you can thus see the generator as the criminal and the discriminator as the cop, which has to catch the criminal.
GANs and their training
How good catching the criminal works is what we known when we finish one epoch – a round of training. For every sample from the validation set, for which a target (fake or real) is available, it is determined how much the predicted value differs from the real one. We call this the loss.
Just with any other neural network, this loss value can be used to optimize the model. Optimizing the model is too complex for this blog, but with a very elegant mathematical technique one simply calculates the shortest path from the mountain top (the worst loss value) towards the valley (the best loss value). Based on one training epoch the model is capable of adapting both the generator and the discriminator for improvement, after which a new epoch can start.
Perhaps, you can imagine that whatever the generator produces is dependent on the discriminator. With every machine learning model, the goal is to maximize the gain; which also means minimizing the loss. When the discriminator becomes better and better in predicting whether an image is real or fake (and consequently yields in higher loss), the generator must improve time after time to get away with its attempt to fool the cop (making the loss lower). The discriminator, however, gets better and better in predicting real pictures, which we fed to this neural network. Consequently, if the generator wants to keep up with the discriminator, it means that the generator must make itself better and better in generating images that look like the real ones in the discriminator.
And the recent consequence of those developments within GANs are the pictures on ThisPersonDoesNotExist. It also explains why we’re speaking about an adversarial network, in which two neural networks play a zero-sum game against each other… what one wins in terms of loss, is what the other loses.
How This Person Does Not Exist is unique
Yet, the story does not end there. Generative adversarial networks work in some kind of cop-and-criminal-relationship in order to produce very interesting results. But This Person Does Not Exist had a different goal: showing that it is possible to generate very accurate but also very large (1024 x 1024 pixels and larger) pictures can be generated at some speed.
That’s exactly what the bottleneck of GANs was at the time. Early GANs worked quite well, but were not too accurate (resulting in vague pictures) or could only make smaller images. In 2018, NVIDIA’s AI research team proposed a solution: the ProGAN network, which composes the generator in a very specific way. It is different in the sense that it buils the picture layer after layer, where the layers get bigger and more accurate. For example, the first layer is 4 by 4 pixels, the second 8 by 8, and so on. The interesting part of this way of working is that every new layer can benefit from the less granular results of the previous ones. In fact, is does not have to find out everything on its own. As we all know, extending something that already exists is much easier than starting out of the blue. ProGAN was thus a small breakthrough in the field of generative adversarial networks.
But that still doesn’t end the story. The GAN that is built into This Person Does Not Exist is named StyleGAN, and is an upgrade of ProGAN. NVIDIA’s AI team added various new elements, which allows practitioners to control more aspects of the network. For example, they can better separate the generator and the discriminator, which ensures less dependence of the generator on the training set. This allows one to, for example, reduce discrimination in the generated pictures. Nevertheless, separating those remains a challenge, which spawns a wide array of research opportunities for generative adversarial networks for the coming years!
All in all, we saw that GANs allow the introduction of creativity in machine learning. That’s simply a totally new approach to machine learning. I am very curious about the new application ares that we will see over the next period. I’ll keep you up to date… 🙂