What Is StyleGAN?

StyleGAN is a type of machine learning framework developed by researchers at NVIDIA in December of 2018. It presented a paradigm shift in the quality and coherence of realistic images created by AI, especially for its capacity of generating realistic human faces

AI-generated images are made possible by a theory in Machine Learning (ML) known as Generative Adversarial Networks (GAN). First introduced in 2014, GAN has gained popularity for its ability to generate photorealistic images from scratch. GANs can artificially generate faces based on the dataset of faces they are trained with. So, what is StyleGAN, and how does it vary from regular GAN models?

The Details


The Style Generative Adversarial Network, or StyleGAN for short, is an evolution of the GAN architecture. StyleGAN was introduced by Nvidia researchers in December 2018, proposing large changes to the original generator architecture models. These changes included Nvidia making the source codes open in February 2019. 

Like regular GAN models, StyleGAN can generate photorealistic high-quality images of faces, but the changes to the model give users the ability to control the style of the generated images. The StyleGAN AI face generator achieves this by varying the style vectors and noise.

Nvidia has put considerable effort into improving the generator models of StyleGAN, releasing the second version called StyleGAN2 in February 2020. StyleGAN2 removes some of the characteristic artefacts of the original model and greatly improves the image quality. In October 2021, Nvidia published StyleGAN3, described as an "alias-free" version.

How Does StyleGAN Work?

StyleGAN was built using an alternative generator architecture for generative adversarial networks. The key architectural design choice of Nvidia in StyleGAN is the use of adaptive instance normalisation (AdaIN). Another difference is how images are handled in the Mapping Network implemented In StyleGAN. Rather than immediately implementing a latent vector in the input layer like regular GANs, the latent input vector is created by StyleGAN using a mapping network that produces the photorealistic images.

StyleGAN implements an intermediary vector at various levels during the image generation process. By transforming the input of each level individually, StyleGAN better processes the visual features that are manifested in that level, from standard features such as pose and face shape, to minute details such as hair colour, without altering other levels. This places StyleGAN amongst the top random face generators currently available.

StyleGAN handles image production differently from regular GAN models. StyleGAN simulates images sequentially, originating from a very low resolution (4 x 4) and continuing to enlarge the image until it reaches a higher resolution (1024 × 1024). This is possible because of baseline progression which allows StyleGAN to add a new section to both models and maintain the larger resolution.

GAN Versus StyleGAN Architecture

GANs are still considered a new theory in Machine Learning, although their applications in AI image generation have expanded greatly. However, regulating outputs has been one of the major challenges of GAN models—StyleGAN solves this problem using mapping, which has been explained above.

Nvidia designed StyleGAN with a combination of Progressive GAN and neural style transfer. Progressive GAN is a method for training GAN for large-scale image generation that grows a GAN generator from small to large scale in a pyramidal fashion. The key architectural difference between StyleGAN and GAN is a progressive growth mechanism integration, which allows StyleGAN to fix some of the limitations of GAN.

Also, the training model of StyleGAN is different, mixing two latent variable styles during training compared with one applied in regular GANs. By using two sources of randomness to produce a synthetic image, StyleGAN is able to achieve better outputs than regular GANs. StyleGAN depends on Nvidia's CUDA software, GPUs, and Google's TensorFlow (or Meta AI's PyTorch). 

While regular GANs generate images from stochastically generated latent variables, StyleGAN generates from a fixed value tensor. The stochastically generated variables are used as style vectors in the adaptive instance normalisation at each resolution after being transformed by an eight-layer feedforward network. 

Conclusion

By implementing Mapping Network and Adaptive Normalisation (AdaIN), StyleGAN becomes even more proficient than regular GAN models in producing photorealistic high-quality images of faces. StyleGAN also grants users control over the characteristics of the created image at different specification levels by changing the style vectors and noise. This model could possibly provide the foundation for the future of AI face generator models.