Generative AI – What is it and How Does it Work?

Generative AI enables users to quickly generate new content based on a variety of inputs. Inputs and outputs to these models can include text, images, sounds, animation, 3D models, or other types of data.

How Does Generative AI Work?

Generative AI models use neural networks to identify the patterns and structures within existing data to generate new and original content. One of the breakthroughs with generative AI models is the ability to leverage different learning approaches, including unsupervised or semi-supervised learning for training. This has given organizations the ability to more easily and quickly leverage a large amount of unlabeled data to create foundation models. As the name suggests, foundation models can be used as a base for AI systems that can perform multiple tasks.  Examples of foundation models include GPT-3 and Stable Diffusion, which allow users to leverage the power of language. For example, popular applications like ChatGPT, which draws from GPT-3, allow users to generate an essay based on a short text request. On the other hand, Stable Diffusion allows users to generate photorealistic images given a text input.

How to Evaluate Generative AI Models?

The three key requirements of a successful generative AI model are:

  1. Quality: Especially for applications that interact directly with users, having high-quality generation outputs is key. For example, in speech generation, poor speech quality is difficult to understand. Similarly, in image generation, the desired outputs should be visually indistinguishable from natural images.
  2. Diversity: A good generative model captures the minority modes in its data distribution without sacrificing generation quality. This helps reduce undesired biases in the learned models.
  3. Speed: Many interactive applications require fast generation, such as real-time image editing to allow use in content creation workflows.

How to Develop Generative AI Models?

There are multiple types of generative models, and combining the positive attributes of each result in the ability to create even more powerful models.

Below is a breakdown:

  • Diffusion models: Also known as denoising diffusion probabilistic models (DDPMs), diffusion models are generative models that determine vectors in latent space through a two-step process during training. The two steps are forward diffusion and reverse diffusion. The forward diffusion process slowly adds random noise to training data, while the reverse process reverses the noise to reconstruct the data samples. Novel data can be generated by running the reverse denoising process starting from entirely random noise. A diffusion model can take longer to train than a variational autoencoder (VAE) model, but thanks to this two-step process, hundreds, if not an infinite amount, of layers can be trained, which means that diffusion models generally offer the highest-quality output when building generative AI models. Additionally, diffusion models are also categorized as foundation models, because they are large-scale, offer high-quality outputs, are flexible, and are considered best for generalized use cases. However, because of the reverse sampling process, running foundation models is a slow, lengthy process.
  • Variational autoencoders (VAEs): VAEs consist of two neural networks typically referred to as the encoder and decoder. When given an input, an encoder converts it into a smaller, more dense representation of the data. This compressed representation preserves the information that’s needed for a decoder to reconstruct the original input data, while discarding any irrelevant information. The encoder and decoder work together to learn an efficient and simple latent data representation. This allows the user to easily sample new latent representations that can be mapped through the decoder to generate novel data. While VAEs can generate outputs such as images faster, the images generated by them are not as detailed as those of diffusion models.
  • Generative adversarial networks (GANs): Discovered in 2014, GANs were considered to be the most commonly used methodology of the three before the recent success of diffusion models. GANs pit two neural networks against each other: a generator that generates new examples and a discriminator that learns to distinguish the generated content as either real (from the domain) or fake (generated).

The two models are trained together and get smarter as the generator produces better content and the discriminator gets better at spotting the generated content. This procedure repeats, pushing both to continually improve after every iteration until the generated content is indistinguishable from the existing content. While GANs can provide high-quality samples and generate outputs quickly, the sample diversity is weak, therefore making GANs better suited for domain-specific data generation. Another factor in the development of generative models is the architecture underneath. One of the most popular is the transformer network. It is important to understand how it works in the context of generative AI.

Transformer networks: Similar to recurrent neural networks, transformers are designed to process sequential input data non-sequentially. Two mechanisms make transformers particularly adept for text-based generative AI applications: self-attention and positional encodings. Both of these technologies help represent time and allow for the algorithm to focus on how words relate to each other over long distances

A self-attention layer assigns a weight to each part of an input. The weight signifies the importance of that input in context to the rest of the input. Positional encoding is a representation of the order in which input words occur. A transformer is made up of multiple transformer blocks, also known as layers. For example, a transformer has self-attention layers, feed-forward layers, and normalization layers, all working together to decipher and predict streams of tokenized data, which could include text, protein sequences, or even patches of images.

What are the Applications of Generative AI?

Generative AI is a powerful tool for streamlining the workflow of creatives, engineers, researchers, scientists, and more. The use cases and possibilities span all industries and individuals. Generative AI models can take inputs such as text, image, audio, video, and code and generate new content into any of the modalities mentioned. For example, it can turn text inputs into an image, turn an image into a song, or turn video into text.

What are the Challenges of Generative AI?

As an evolving space, generative models are still considered to be in their early stages, giving them space for growth in the following areas.

  1. Scale of compute infrastructure: Generative AI models can boast billions of parameters and require fast and efficient data pipelines to train. Significant capital investment, technical expertise, and large-scale compute infrastructure are necessary to maintain and develop generative models. For example, diffusion models could require millions or billions of images to train. Moreover, to train such large datasets, massive compute power is needed, and AI practitioners must be able to procure and leverage hundreds of GPUs to train their models.
  2. Sampling speed: Due to the scale of generative models, there may be latency present in the time it takes to generate an instance. Particularly for interactive use cases such as chatbots, AI voice assistants, or customer service applications, conversations must happen immediately and accurately. As diffusion models become increasingly popular due to the high-quality samples that they can create, their slow sampling speeds have become increasingly apparent.
  3. Lack of high-quality data: Oftentimes, generative AI models are used to produce synthetic data for different use cases. However, while troves of data are being generated globally every day, not all data can be used to train AI models. Generative models require high-quality, unbiased data to operate. Moreover, some domains don’t have enough data to train a model. As an example, few 3D assets exist and they’re expensive to develop. Such areas will require significant resources to evolve and mature.
  4. Data licenses: Further compounding the issue of a lack of high-quality data, many organizations struggle to get a commercial license to use existing datasets or to build bespoke datasets to train generative models. This is an extremely important process and key to avoiding intellectual property infringement issues.

What are the Benefits of Generative AI?

Generative AI is important for a number of reasons. Some of the key benefits of generative AI include:

  1. Generative AI algorithms can be used to create new, original content, such as images, videos, and text, that’s indistinguishable from content created by humans. This can be useful for applications such as entertainment, advertising, and creative arts.
  2. Generative AI algorithms can be used to improve the efficiency and accuracy of existing AI systems, such as natural language processing and computer vision. For example, generative AI algorithms can be used to create synthetic data that can be used to train and evaluate other AI algorithms.
  3. Generative AI algorithms can be used to explore and analyze complex data in new ways, allowing businesses and researchers to uncover hidden patterns and trends that may not be apparent from the raw data alone.
  4. Generative AI algorithms can help automate and accelerate a variety of tasks and processes, saving time and resources for businesses and organizations.

Overall, generative AI has the potential to significantly impact a wide range of industries and applications and is an important area of AI research and development.

Source: Nvidia

Share Post