AI Image generation 3
Diffusion models
Last updated
Diffusion models
Last updated
The Martian shelters that have been created thus far using img2img generally match the input image in terms of its silhouette, but they do not appear like minimal surface structures on closer inspection. What if we want the final result to strictly conform to the original image? In this last section, we will experiment with ControlNet, which is a neural network that enables more precise control over the generated image outputs.
We will need to install controlnet first. Thankfully, this process is straightforward in webui. Go to the Extensions tab and enter the url https://github.com/Mikubill/sd-webui-controlnet
and hit the "Install" button. Once complete, switch to the "Installed" tab and press "Check for updates" button.
In "img2img"tab in webui, load the original Rhino screenshot image and provide a text prompt like the one above. Scroll down and expand the ControlNet panel. At the top toggle "Enable". Optionally you may also check "Low VRAM" if you do not have a high end GPU and "Pixel Perfect" which will choose the optimal resolution for ControlNet. Select "Canny pre-processor first and webui will automatically display it under "Model" menu.
Canny detects outlines from pixel areas with high contrasts. Go ahead and generate images using the default parameter values. These outlines which correspond to the mesh edges are well preserved in these results and the structure consequently resembles the original minimal surface.
Next, we will try out the depth processor. Instead of detecting outlines, it creates a map that describes what is near to the picture plane and which objects are further in the background. Depth maps are useful for control over spatial positions. In this case, we want to preserve the minimal surface form and let stable diffusion fill in the background.
What are some differences between the images generated with the canny and depth pre-processors?
For the depth pre-processor, try out and assess the different options (balanced/ prompt is more important/ ControlNet is more important).
Try using the images generated with the canny pre-processor as the inputs for another run with the depth pre-processor.
We learnt the basics of image generation using Stable Diffusion, which is one amongst many text-to-image AI models (Dall-E, midjournery etc.) that are now widely available. Even with this brief introduction, we managed to generate detailed and visually convincing images of a fictitious Martian shelter in the end, after starting with text prompts before transitioning to image prompts. Consider the amount of expertise and time required to create equivalent output by more conventional methods! Nonetheless, we have only scratched the surface of using Stable Diffusion in this session, and we leave it to you to try out the myriad other parameters that were not discussed. Those interested may also explore ComfyUI—an alternative user interface for stable diffusion—which is graphical and node-based just like Grasshopper.
The focus of this session was on how to use stable diffusion. It worked like a black-box taking in inputs and parameter settings and generating image outputs. However, we did not address the technical and conceptual underpinnings of the AI model; in other words, how these models actually work under the hood. Beyond just the technical, the ethics and cultural implications of using AI are also critical to discuss. This lies is beyond the scope of this introductory tutorial, but rest assured, subsequent courses at SUTD that will tackle these topics in further depth.
Next we have to download models for the ControlNet extension. [] Download the .safetnsor files and then place the models in your models/ControlNet
subfolder. We will only make use of "canny" and "depth" models for this example, but you can go ahead and download all of them.