10.014 CTD
  • Overview
  • Schedule
  • Administrative
    • Accessing Rhino remotely
    • Rhino for Mac
  • ASSIGNMENTS
    • Dates and rubrics
    • Generative design
      • Generative design
    • Parametric design
      • Parametric design
    • Simulated design
      • Simulated design
      • Simulated design
  • SESSION 1B
    • Computer Aided Design
    • Ranges and expressions 1
      • Ranges and expressions 2
      • Ranges and expressions 3
      • Ranges and expressions 4
      • Ranges and expressions 5
      • Ranges and expressions 6
  • SESSION 2A
    • Visual programming 1
      • Visual programming 2
      • Visual programming 3
      • Visual programming 4
    • Associative modelling 1
      • Associative modelling 2
      • Associative modelling 3
  • SESSION 2B
    • Logical Patterns 1
      • Logical patterns 2
      • Logical patterns 3
  • SESSION 3A
    • Spatial geometry 1
      • Spatial geometry 2
      • Spatial geometry 3
      • Spatial geometry 4
      • Spatial geometry 5
      • Spatial geometry 6
      • Spatial geometry 7
    • Curve geometry 1
      • Curve geometry 2
      • Curve geometry 3
      • Curve geometry 4
  • SESSION 3B
    • Surface geometry
    • Parametric modelling 1
      • Parametric modelling 2
      • Parametric modelling 3
      • Parametric modelling 4
  • SESSION 4A
    • Information nesting 1
      • Information nesting 2
      • Information nesting 3
    • Data landscapes 1
      • Data landscapes 2
      • Data Landscapes 3
      • Data landscapes 4
  • SESSION 4B
    • Mesh geometry 1
      • Mesh geometry 2
      • Mesh geometry 3
  • SESSION 5A
    • Space and time 1
      • Space and time 2
    • Modelling entities 1
      • Modelling entities 2
      • Modelling entities 3
  • SESSION 5B
    • Multibody dynamics 1
      • Multibody dynamics 2
    • Material elasticity 1
      • Material elasticity 2
      • Material elasticity 3
  • SESSION 6A
    • Form-finding 1
      • Form-finding 2
      • Form-finding 3
      • Form-finding 4
  • SESSION 6B
    • AI Image generation 1
      • AI Image generation 2
      • AI Image generation 3
  • APPENDIX
    • Spirograph 1
      • Spirograph 2
    • Curves
    • Swarm Intelligence 1
      • Swarm Intelligence 2
    • Hybrid programming 1
      • Hybrid programming 2
Powered by GitBook
On this page
  • Getting started
  • Text to Image
  • Iterative generation
  1. SESSION 6B

AI Image generation 1

Diffusion models

PreviousForm-finding 4NextAI Image generation 2

Last updated 7 months ago

The session is a meant as an introduction to the exciting and fast evolving world of generative AI. Specifically, we will look at using Stable Diffusion, which is a popular open-source text to image AI model developed by Stability AI. We will be combining Rhino/Grasshopper and Stable Diffusion in an example workflow to produce visualisations of a speculative Martian settlement, whose base form is generated using form-finding methods covered in earlier sessions. Images are a great way to communicate design intent to an audience, but they need to be of a certain quality to be compelling and convincing. Typically, designers use specialised software to render such images, but it requires time to set up the model and to compute the final result. AI tools have in sense short-circuited this process, allowing designers to batch produce quality images in a short amount of time.

Getting started

Python and Git

User Interface

You may download the repository as a ZIP file. Extract the files into a folder located in a convenient location such as your desktop, for the purpose of this session. Alternatively, we will use git to clone i.e. make a local copy of the repository. For windows, type "cmd" in the windows search bar to open a command prompt. Enter the following command

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

Afterwards, there will be a "stable-diffusion-webui' folder created in your home directory e.g. C:\Users\YOUR_USER_NAME. You may change the location of the folder and place it on your desktop instead. At this point , you have a local cloned repository. In this folder, locate a file called webui-user.bat and run it. This process will take a while to complete if you are doing for first time as missing Python dependencies and packages will be downloaded and installed within a virtual environment.

At the end of this step, webui will automatically launch in a browser tab. If not, open your browser and enter the following URL: http://127.0.0.1:7860/

The image above shows the interface that we will be working with. However, we cannot generate any images yet as we are still missing AI models that need be downloaded next.

Stable diffusion model

The original Stable Diffusion text-to-image AI model was released by Stable Diffusion in early 2022. Numerous official versions have been released since, with v3 being the latest. Since it is an open-source project, numerous fine-tuned models have also been released by the wider community, expanding the choices of models available to us. For this session, we recommend starting with the older version 1 stable diffusion models, primarily because they will be less demanding to run on our local machines.

Finally, open your stable-diffusion-webui folder in file explorer and navigate to models then Stable-diffusion sub-folders. Place your downloaded checkpoints or safetensors files in there.

Text to Image

We are finally ready to test whether we have set everything up correctly by generating our first image using text prompts. In the webui, select the downloaded models at 'Stable Diffusion checkpoint' drop-down menu and enter a short text prompt. Let's leave the settings at default values and hit the 'Generate' button. If all goes well, you should see a 512 x512 image produced after a while.

There are many settings that you can adjust to generate an image, but let us introduce the key ones starting with the text prompt. The prompt is essentially your instructions to the AI and should be descriptive and specific in order to steer the outcome in a desired direction. For example, the earlier prompt "Mars settlement, astronaut" is overly terse and hence vague. The next version:

One point  perspective of  inflatable shelter with white membrane anchored to the surface of Mars, astronauts walking around, rocky landscape in foreground, mountains in the background, sci-fi realistic, cinematic lighting with red hue

improves on the former by describing the subject (inflatable structure) and its features (white membrane). It describes the format (one-point perspective) and style (sci-fi realistic) of the desired image and provides secondary details (rocky landscape in foreground). It also includes filters (cinematic lighting with red hue) that affect the atmosphere/mood of the image.

There are numerous sampling methods to choose from and each has a rather cryptic name. Explaining the mechanisms of these samplers, which are essentially discretized differential equations, is beyond the scope of this session. In general though, the recommendation here is to choose between DPM++2M (default), Euler_A and DPM++2M Karras, which are good generic samplers due to their speed at converging. Step size should be set in conjunction with the sampler, and it can be left at its default value (20). Increasing the number of steps can help increase detail but also introduce unwanted artefacts.

The other important parameter is CFG (Classifier Free Guidance) scale. The higher the number, the closer the model will try to match the prompt. Low numbers (<5) allows the AI to be more 'creative' or deviate from the prompt. The general sweet spot is between 7 and 14. The following image tabulates sampler choice (X axis) and CFG values (Y Axis) as a reference for the prompt "a squirrel wearing a bucket hat. Pixar".

Iterative generation

Working iteratively is a good strategy for image generation. Make small changes to a prompt rather than wholesale ones. Adjust parameter values/settings one at a time. Two other features are useful to support this iterative way of working. First, you can increase the batch value to generate images using the same input prompts and parameters for comparison. Second, you can set a seed value, which by default is randomised i.e. -1. The seed determines a specific region in the latent space of the diffisuon model. Given a fixed seed value and inputs, the same image will be generated i.e. it is deterministic. You can make of use this by fixing a seed and iteratively adjusting your prompt or parameters to understand the effect of the change.

Practice

  • Start out with a prompt that is perhaps overly concise then keep adding detail to it. Is there a point where overly detailed prompts start to result in undesired results?

  • If you start to notice that there are consistent characteristics shared by the generate images, try writing negative prompts and assess the results. For example, does the negative prompt 'dome' help to steer the AI away from generating a preponderance of symmetrical domes?

  • Systematically increase CFG Scale parameter value while keeping the seed constant. What do you begin to notice?

While the simplest way to get started is through using online AI generators like Tensor.Art [] or NightCafe, we will instead set up our machines to run Stable Diffusion locally. This involves a couple of steps.

First, we need to install Python. Search for version 3.10.6 and download the corresponding installer []. Take note that newer versions of Python do not support Torch machine learning library which is needed to run Stable Diffusion. Install Python using the default options and check the option 'Add Python to PATH'.

Next we need to install Git, which is a version control system. Developers often use a version control systems to manage a repository, which is a collection of source code, and to collaborate on software development. Navigate to git [], then download and run the installer. You can accept the default installation settings.

For those who are installing git on macOS or Linux, you may follow the guide here []

Having installed python and git, we now turn our attention to installing a User Interface (UI) to make running stable diffusion more accessible. We will be using the Stable Diffusion webui developed by AUTOMATIC1111, which can be found here on github [].

For students using macs, you may refer to the webui installation guide here []

Many AI models are hosted on the Hugging Face []. First register on the platform, then navigate to the "Models" tab and in the search bar at the top, look for stable-diffusion-v1-4 [] or stable-diffusion-v1-5 []. In the repository, click on the tab "Files and Versions", then search for files that have either a .ckpt or the newer .safetensor type extensions. The smaller inference models are sufficient for our purposes, so download those by clicking the buttons on the right.

>
>
>
>
>
>
>
>
>
A Martian station with a minimal surface structure
Select Python 3.10.6
Cloning webui
Running webui-user.bat
webui
Stable diffusion v 1-4
Stable diffusion v 1-5. Select the smaller pruned.fp16.safetensor model
Place inside models> Stable-diffusion subfolder
txt2img process
Result of more descriptive prompt
Sample batch of images
Source:
https://diffusion-news.org/stable-diffusion-settings-parameters