AI Image generation 2
Diffusion models
Last updated
Diffusion models
Last updated
To exert more control over the intended output, we can use image prompts in conjunction with textual ones. In this approach, we first create an image that functions as a low-fidelity description of the intended outcome.
We return to Rhino/Grasshopper and run one of the earlier form-finding definitions e.g. minimal surface. Bake the result from Grasshopper (you may have to do it from inside the cluster). Thereafter, use the command 'ViewCaptureToFile" command to save the image. Customise its size and set it to to 512 x 512 pixels. We will use this image as our prompt later.
Switch to the img2img tab in webui. We will enter a text prompt as usual, but at the same time drag our saved image into the Generation >img2img tab to act as an image prompt. Scrolling down, we find a new slider below CFG scale called Denoising strength. This value determines how much of the input image to preserve, with higher numbers causing more change and lower values leaving the original intact. As seen in the batch images below, we are now able to achieve results that resemble the reference instead of arbitrary dome-shaped, curved or tent structure like before.
Experiment with adjusting the relative values of CFG and denoising strength. What do you observe if you made one value high and the other low?
Since our intended output is a building/structure, we may use more specialised AI models than the general stable-diffusion v1 models. For example, you may search for "architecturerealmix" on Hugging Face. Click on the option " clone repository" and copy the command
git clone https://huggingface.co/stablediffusionapi/architecturerealmix
Using git like earlier, clone the repository to a local folder.
Once the local repository is created, move the architecturerealmix safetensor file to your local \stable-diffusion-webui\models\Stable-diffusion
subfolder. Afterwards, you will have the option of choosing the "architecturerealmix" model in webui. Since it is trained on a dataset of architecture related images, the model produces more realistic looking results with building elements like railings, panelling and mullions.
To improve the results even further, we can select a best image from a batch, for example the bottom right one from the table above, as the next image prompt. Use the picture-frame icon (circled in red below) to send the image and generation parameters back to the img2img tab. Repeating the selection-generation process leads to 'fitter' results in an evolutionary sense as images more closely match your intentions and aesthetic preferences.
The results look compelling so far, but it appears that the AI model has mistakenly generated a tiled boulevard rather than naturalistic rocky landscape in the foreground. We can make use of inpainting to fix this region of the image. Click on the painting icon to send an image to the "Inpaint" tab. At the top right of the tab, click on the brush icon, and use the slider to specify its size. Thereafter, paint the foreground of the image which we wish to change. This creates a mask, which is represented as a semi-transparent white region.
The parts of the image outside the white region will be conserved. You may leave the default inpaint parameters unchanged for a start. However, we should edit the text prompt now, because it applies only to the mask region. For example, we may change it to:
There are 4 mask options -fill, original, latent noise and latent nothing. Try the different options and observe the results.
Try adjusting the "Mask blur" slider values, what effect does this have?
Try resizing your favourite image. Place it in the img2img tab and make sure you use the same seed. Select the "Resize by" tab to scale the image up uniformly. Should you specify a high or low Denoise strength value?
Alternatively, if the architecturerealmix safetensors file cannot be found in the repository, use this alternative