Case #4: Using Stable Diffusion XL and ComfyUI for Synthetic Dataset Augmentation
Overview
Synthetic Datasets require a lot of variations and it traditionally takes a lot of visual expertise and time to generate them. Generative augmentations make it possible to create massive amounts of variations via prompts and close the appearance domain gap, effortlessly creating photorealistic results.
This example demonstrates a basic workflow for modifying an existing synthetic dataset using ComfyUI and a generative diffusion model. We use techniques that are suited to robust synthetic data generation such as ‘regional prompting’ and ‘inpainting’. This prepares you for the creation of your own augmentation pipeline.
Preparing a Synthetic Dataset
We provide a sample dataset of images for the workflow below. If you’d like to make your own dataset, this Replicator script demonstrates a custom writer that will generate stable semantic color IDs, along with a direct output of depthmap images for use within the ComfyUI Graph.
The outputs of the Replicator script from your own scene should closely match the dataset the images below. These four image outputs, RGB, Semantic Segmentation, Normals and Depth will all be used as inputs in the ComfyUI workflow.
One key part of this workflow is the use of 3D assets. If you are missing assets in your library, you can create them with Generative AI. NVIDIA has made a NIM available to partners - Edify3D - to do so. As an enterprise, you can simply contact one of the companies that have built model using NVIDIA Edify 3D to augment your dataset. With Edify3D powered NIMs you can quickly make simple objects that you might not have in your asset library. These assets can then be treated almost like a 3D control net where we can use these assets as a base for augmentation and not as the final 3D asset itself. These generated 3D assets can help provide the necessary 3D rough data and mask conditioning for the models to generate photoreal results in their place.
Requirements and Installation
Before you get started, review the following requirements, then follow the ComfyUI installation instructions below.
Check that your hardware matches the Technical Requirements for the Omniverse Platform.
Install ComfyUI by following the installation instructions on their GitHub page.
Windows - It is easiest to install the portable standalone build for Windows. However, a manual install will also work.
Linux - Use the manual install method.
Install ComfyUI-Manager by following the installation instructions on their GitHub page. Pay special attention to the different methods.
Download and install the required models.
Download the SDXL base model file ‘sd_xl_base_1.0.safetensors’. Move the file to the ComfyUI installation folder under
ComfyUI\models\checkpoints
.Two ControlNet models should be downloaded.
The ControlNet Union model ‘diffusion_pytorch_model.safetensors’.
The Stability Control Lora Canny ‘control-lora-canny-rank128.safetensors’.
Move both files to the ComfyUI installation folder under
ComfyUI\models\controlnet
.Download the example files provided for this workflow from the synthetic-data-examples on GitHub.
A small sample dataset of images to be augmented by the ComfyUI graph.
Set Up and Open the ComfyUI Augmentation Graph Example
ComfyUI is a powerful backend and GUI for building and executing pipelines using diffusion models and more.
Run ComfyUI.
Windows Portable - Run
‘ComfyUI_windows_portable\run_nvidia_gpu.bat’
.Windows and Linux Manual Install - Using the command line, run
ComfyUI\python main.py
.The ComfyUI server process will run in a command line window. There you can view what ComfyUI is doing if you need to check for errors, warnings, or to see progress of image generations.
When fully loaded, it will open a window in your internet browser with the local address
http://127.0.0.1:8188/
. This is connecting the visual frontend to the ComfyUI server now running in the background.Next, we want to “load” the file ‘sdg_SDXL_augmentation_example.json’ that we downloaded earlier from the example links. Click the
Load
button on the ComfyUI menu. Navigate to the location of the downloaded .json graph and open it.An error window will display and show missing dependencies, along with red “missing” nodes behind it. This is expected.
We can fix this through the following steps:
Click the
Manager
button at the bottom of the ComfyUI menu.
Note
If you do not see a
Manager
button, see the ComfyUI-Manager instructions in the “Requirements and Installation” section of this page.Select
Install Missing Custom Nodes
.Select
Install
for each of the listed node dependencies. Alternatively, you can select the checkbox next toID
(upper left) to select them all and then selectInstall
. These may take several minutes to install, depending on your internet speed and hardware. You can check the progress of these downloads in ComfyUI’s command line window.
Once all have been installed, Click the red
Restart
button at the bottom. This will restart the comfyUI local server to enable the newly added dependencies. Note that you do not need to close the browser window, but aReconnecting…
dialog will be displayed until the server has been restarted.Note
This first restart may take several minutes, downloading more dependencies as needed. You can always view the progress in the command line window.
After installing the dependencies, your graph should no longer display any bright red missing nodes.
At this stage your ComfyUI graph should look similar to the following image. Next, we will assign the models and image references in the marked sections A, B and C.
In section A, assign the
sd_xl_base_1.0.safetensors
model in theLoad Checkpoint
node. Directly click on the highlighted section, and select the model from the pop-up list. This must be done to refresh the assignment of the model: Even though it may show correctly, if not reassigned here, queueing a prompt later will error.In section B, assign the
diffusion_pytorch_model.safetensors
andcontrol-lora-canny-rank128.safetensors
models to each of theLoad Advanced ControlNet Model
nodes. As before, directly click on the highlighted sections and select the model from the popup list. This must be done to refresh the assignment of the model: Even though it may show correctly, if not reassigned here, queueing a prompt later will error.In section C assign the sample dataset images to the four nodes.
Click on
Choose file to upload
on each of the nodes, and select the images listed below. These images are found in the small sample dataset downloaded previously in the Requirements and Installation section above.
- ‘Semantic Segmentation Image’ Node
‘dataset/semantic_2.png’
- ‘Depth Image’ Node
‘‘dataset/depth_2.png’
- ‘Normals Image’ Node
‘‘dataset/normals_2.png’
- ‘RGB Image’ Node
‘‘dataset/rgb_2.png’
In the ComfyUI menu, click the
Queue Prompt
button, featured prominently at the top. A green progress bar will display and update at the top of the browser window. When complete, the generated image will be displayed in theSave Image
node at the farthest right edge of the graph.
Saved images are written by default to the ComfyUI installation
ComfyUI\output
folder.Troubleshooting If you encounter an error similar to
Failed to find .../comfyui_controlnet_aux/ckpts/lllyasviel/Annotators/ControlNetHED.pth
in the comfyUI command line, its likely a Windows path length issue. Try reinstalling ComfyUI in a location where the path is shorter.
Augmentation Graph Breakdown
The ComfyUI graph itself is a developer tool for building and iterating on pipelines. Another way to think about it is ‘programming with models’. Below is an image of the example graph and the different sections and their purpose.
Overall, the graph uses ‘regional prompting’ with the masks from the semantic segmentation image. This allows us to describe the visual properties of the different parts of the image.
Let’s explore the different parts of the graph for a more complete understanding of how it works.
Load Checkpoint
This is the node that loads the base sdxl model.
Regional Prompting
The ‘Conditioning (SetMask)’ nodes take a prompt and a mask to determine the conditioning of the different parts of the image. These nodes and prompts form the heart of the graph and are the most important piece influencing the output generations.
For example, we can change the prompt so that the generated forklift is yellow, or red, rusty, old, new etc
Load Images
These nodes load the Dataset images previously created using Omniverse and Replicator.
Masks, ControlNet Images, Denoising Weightmap
In this section, we create masks from the segmentation image for each of the regions we want to inpaint/outpaint.
We also create an outline image using the ‘HED’ preprocessor to be used on the ControlNet in section 5.
Lastly, we create a mask to inform the inpainting how much denoising should be applied to different sections of the image. We do this, because sometimes we only wish to lightly denoise some objects, changing them slightly, instead of a full “reimagining” with a denoise of 100% (or white in the mask).
ControlNets
The controlnet models are used to constrain and inform guidance for the generation of the outputs. These are critical for the control of the structure of the outputs. Without these, the model would be free to imagine anything loosely based on the prompts, with no adherence to the exact placement of the objects in the scene.
KSampler
This is the node that does the work in the graph. It iteratively denoises the image over many steps until it completes.
Alter Properties of the Graph
Many properties of this graph can be altered. To name a few:
Prompts
Change the local and global prompts to create variation in the generated output images.
KSampler
control_after_generate
- Ensure this is random if you want different generations each time
steps
- 20-30 is a good range
cfg
- 2-4
sampler_name
- Try different ones. Some are slower but have higher quality.
scheduler
- Similar to the sampler, different scheduling can have different effects on outputs.ControlNets
Try adjusting the
strength
orstart_percent
to see how they guide the image.You could also try different ControlNet types, such as normals.
Denoise Strength
The denoise strength for each of the mask classes can be adjusted. For example, a low denoise strength on the forklift will lightly alter the forklift in the image. A high denoise of ‘1.0’ will completely reimagine the forklift according to the provided prompt for that class.
Augmentation Results and Analysis
Shown below, is an example augmentation from the sample dataset with the SDXL base model.
The quality of image outputs can be considerably improved through the use of fine-tuned base models, or other techniques such a LoRAs. Shown below are some examples using more advanced models and techniques.
➤ Next Steps: Review the Known Issues