AI Image generation 1

Diffusion models

The session is a meant as an introduction to the exciting and fast evolving world of generative AI. Specifically, we will look at using Stable Diffusion, which is a popular open-source text to image AI model developed by Stability AI. We will be combining Rhino/Grasshopper and Stable Diffusion in an example workflow to produce visualisations of a speculative Martian settlement, whose base form is generated using form-finding methods covered in earlier sessions. Images are a great way to communicate design intent to an audience, but they need to be of a certain quality to be compelling and convincing. Typically, designers use specialised software to render such images, but it requires time to set up the model and to compute the final result. AI tools have in sense short-circuited this process, allowing designers to batch produce quality images in a short amount of time.

Getting started

Python and Git

User Interface

You may download the repository as a ZIP file. Extract the files into a folder located in a convenient location such as your desktop, for the purpose of this session. Alternatively, we will use git to clone i.e. make a local copy of the repository. For windows, type "cmd" in the windows search bar to open a command prompt. Enter the following command

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

Afterwards, there will be a "stable-diffusion-webui' folder created in your home directory e.g. C:\Users\YOUR_USER_NAME. You may change the location of the folder and place it on your desktop instead. At this point , you have a local cloned repository. In this folder, locate a file called webui-user.bat and run it. This process will take a while to complete if you are doing for first time as missing Python dependencies and packages will be downloaded and installed within a virtual environment.

At the end of this step, webui will automatically launch in a browser tab. If not, open your browser and enter the following URL: http://127.0.0.1:7860/

The image above shows the interface that we will be working with. However, we cannot generate any images yet as we are still missing AI models that need be downloaded next.

Stable diffusion model

The original Stable Diffusion text-to-image AI model was released by Stable Diffusion in early 2022. Numerous official versions have been released since, with v3 being the latest. Since it is an open-source project, numerous fine-tuned models have also been released by the wider community, expanding the choices of models available to us. For this session, we recommend starting with the older version 1 stable diffusion models, primarily because they will be less demanding to run on our local machines.

Finally, open your stable-diffusion-webui folder in file explorer and navigate to models then Stable-diffusion sub-folders. Place your downloaded checkpoints or safetensors files in there.

Text to Image

We are finally ready to test whether we have set everything up correctly by generating our first image using text prompts. In the webui, select the downloaded models at 'Stable Diffusion checkpoint' drop-down menu and enter a short text prompt. Let's leave the settings at default values and hit the 'Generate' button. If all goes well, you should see a 512 x512 image produced after a while.

There are many settings that you can adjust to generate an image, but let us introduce the key ones starting with the text prompt. The prompt is essentially your instructions to the AI and should be descriptive and specific in order to steer the outcome in a desired direction. For example, the earlier prompt "Mars settlement, astronaut" is overly terse and hence vague. The next version:

One point  perspective of  inflatable shelter with white membrane anchored to the surface of Mars, astronauts walking around, rocky landscape in foreground, mountains in the background, sci-fi realistic, cinematic lighting with red hue

improves on the former by describing the subject (inflatable structure) and its features (white membrane). It describes the format (one-point perspective) and style (sci-fi realistic) of the desired image and provides secondary details (rocky landscape in foreground). It also includes filters (cinematic lighting with red hue) that affect the atmosphere/mood of the image.

There are numerous sampling methods to choose from and each has a rather cryptic name. Explaining the mechanisms of these samplers, which are essentially discretized differential equations, is beyond the scope of this session. In general though, the recommendation here is to choose between DPM++2M (default), Euler_A and DPM++2M Karras, which are good generic samplers due to their speed at converging. Step size should be set in conjunction with the sampler, and it can be left at its default value (20). Increasing the number of steps can help increase detail but also introduce unwanted artefacts.

The other important parameter is CFG (Classifier Free Guidance) scale. The higher the number, the closer the model will try to match the prompt. Low numbers (<5) allows the AI to be more 'creative' or deviate from the prompt. The general sweet spot is between 7 and 14. The following image tabulates sampler choice (X axis) and CFG values (Y Axis) as a reference for the prompt "a squirrel wearing a bucket hat. Pixar".

Iterative generation

Working iteratively is a good strategy for image generation. Make small changes to a prompt rather than wholesale ones. Adjust parameter values/settings one at a time. Two other features are useful to support this iterative way of working. First, you can increase the batch value to generate images using the same input prompts and parameters for comparison. Second, you can set a seed value, which by default is randomised i.e. -1. The seed determines a specific region in the latent space of the diffisuon model. Given a fixed seed value and inputs, the same image will be generated i.e. it is deterministic. You can make of use this by fixing a seed and iteratively adjusting your prompt or parameters to understand the effect of the change.

Practice

Start out with a prompt that is perhaps overly concise then keep adding detail to it. Is there a point where overly detailed prompts start to result in undesired results?
If you start to notice that there are consistent characteristics shared by the generate images, try writing negative prompts and assess the results. For example, does the negative prompt 'dome' help to steer the AI away from generating a preponderance of symmetrical domes?
Systematically increase CFG Scale parameter value while keeping the seed constant. What do you begin to notice?

PreviousForm-finding 4 NextAI Image generation 2

Last updated 7 months ago