AI Image generation 1
Diffusion models
Last updated
Diffusion models
Last updated
The session is a meant as an introduction to the exciting and fast evolving world of generative AI. Specifically, we will look at using Stable Diffusion, which is a popular open-source text to image AI model developed by Stability AI. We will be combining Rhino/Grasshopper and Stable Diffusion in an example workflow to produce visualisations of a speculative Martian settlement, whose base form is generated using form-finding methods covered in earlier sessions. Images are a great way to communicate design intent to an audience, but they need to be of a certain quality to be compelling and convincing. Typically, designers use specialised software to render such images, but it requires time to set up the model and to compute the final result. AI tools have in sense short-circuited this process, allowing designers to batch produce quality images in a short amount of time.
You may download the repository as a ZIP file. Extract the files into a folder located in a convenient location such as your desktop, for the purpose of this session. Alternatively, we will use git to clone i.e. make a local copy of the repository. For windows, type "cmd" in the windows search bar to open a command prompt. Enter the following command
Afterwards, there will be a "stable-diffusion-webui' folder created in your home directory e.g. C:\Users\YOUR_USER_NAME
. You may change the location of the folder and place it on your desktop instead. At this point , you have a local cloned repository. In this folder, locate a file called webui-user.bat
and run it. This process will take a while to complete if you are doing for first time as missing Python dependencies and packages will be downloaded and installed within a virtual environment.
At the end of this step, webui will automatically launch in a browser tab. If not, open your browser and enter the following URL: http://127.0.0.1:7860/
The image above shows the interface that we will be working with. However, we cannot generate any images yet as we are still missing AI models that need be downloaded next.
The original Stable Diffusion text-to-image AI model was released by Stable Diffusion in early 2022. Numerous official versions have been released since, with v3 being the latest. Since it is an open-source project, numerous fine-tuned models have also been released by the wider community, expanding the choices of models available to us. For this session, we recommend starting with the older version 1 stable diffusion models, primarily because they will be less demanding to run on our local machines.
Finally, open your stable-diffusion-webui folder in file explorer and navigate to models then Stable-diffusion sub-folders. Place your downloaded checkpoints
or safetensors
files in there.
We are finally ready to test whether we have set everything up correctly by generating our first image using text prompts. In the webui, select the downloaded models at 'Stable Diffusion checkpoint' drop-down menu and enter a short text prompt. Let's leave the settings at default values and hit the 'Generate' button. If all goes well, you should see a 512 x512 image produced after a while.
There are many settings that you can adjust to generate an image, but let us introduce the key ones starting with the text prompt. The prompt is essentially your instructions to the AI and should be descriptive and specific in order to steer the outcome in a desired direction. For example, the earlier prompt "Mars settlement, astronaut" is overly terse and hence vague. The next version:
improves on the former by describing the subject (inflatable structure) and its features (white membrane). It describes the format (one-point perspective) and style (sci-fi realistic) of the desired image and provides secondary details (rocky landscape in foreground). It also includes filters (cinematic lighting with red hue) that affect the atmosphere/mood of the image.
There are numerous sampling methods to choose from and each has a rather cryptic name. Explaining the mechanisms of these samplers, which are essentially discretized differential equations, is beyond the scope of this session. In general though, the recommendation here is to choose between DPM++2M (default), Euler_A and DPM++2M Karras, which are good generic samplers due to their speed at converging. Step size should be set in conjunction with the sampler, and it can be left at its default value (20). Increasing the number of steps can help increase detail but also introduce unwanted artefacts.
The other important parameter is CFG (Classifier Free Guidance) scale. The higher the number, the closer the model will try to match the prompt. Low numbers (<5) allows the AI to be more 'creative' or deviate from the prompt. The general sweet spot is between 7 and 14. The following image tabulates sampler choice (X axis) and CFG values (Y Axis) as a reference for the prompt "a squirrel wearing a bucket hat. Pixar".
Working iteratively is a good strategy for image generation. Make small changes to a prompt rather than wholesale ones. Adjust parameter values/settings one at a time. Two other features are useful to support this iterative way of working. First, you can increase the batch value to generate images using the same input prompts and parameters for comparison. Second, you can set a seed value, which by default is randomised i.e. -1. The seed determines a specific region in the latent space of the diffisuon model. Given a fixed seed value and inputs, the same image will be generated i.e. it is deterministic. You can make of use this by fixing a seed and iteratively adjusting your prompt or parameters to understand the effect of the change.
Start out with a prompt that is perhaps overly concise then keep adding detail to it. Is there a point where overly detailed prompts start to result in undesired results?
If you start to notice that there are consistent characteristics shared by the generate images, try writing negative prompts and assess the results. For example, does the negative prompt 'dome' help to steer the AI away from generating a preponderance of symmetrical domes?
Systematically increase CFG Scale parameter value while keeping the seed constant. What do you begin to notice?
While the simplest way to get started is through using online AI generators like Tensor.Art [] or NightCafe, we will instead set up our machines to run Stable Diffusion locally. This involves a couple of steps.
First, we need to install Python. Search for version 3.10.6 and download the corresponding installer []. Take note that newer versions of Python do not support Torch machine learning library which is needed to run Stable Diffusion. Install Python using the default options and check the option 'Add Python to PATH'.
Next we need to install Git, which is a version control system. Developers often use a version control systems to manage a repository, which is a collection of source code, and to collaborate on software development. Navigate to git [], then download and run the installer. You can accept the default installation settings.
For those who are installing git on macOS or Linux, you may follow the guide here []
Having installed python and git, we now turn our attention to installing a User Interface (UI) to make running stable diffusion more accessible. We will be using the Stable Diffusion webui developed by AUTOMATIC1111, which can be found here on github [].
For students using macs, you may refer to the webui installation guide here []
Many AI models are hosted on the Hugging Face []. First register on the platform, then navigate to the "Models" tab and in the search bar at the top, look for stable-diffusion-v1-4
[] or stable-diffusion-v1-5
[]. In the repository, click on the tab "Files and Versions", then search for files that have either a .ckpt
or the newer .safetensor
type extensions. The smaller inference models are sufficient for our purposes, so download those by clicking the buttons on the right.
pruned.fp16.safetensor
model