Synthetic Data Generation

Synthetic Data Generation (SDG) is the process by which a researcher can create completely artificial, but accurately annotated datasets to use as the baseline for training AI algorithms. SDG datasets are often produced as an alternative to capturing and measuring similar kinds of data in the real-world.

In many cases, capturing real-world data can be difficult, impractical or dangerous. Consider the variety of driving scenarios needed to teach an autonomous vehicle how to navigate roads safely. If all training data had to be captured from the real-world, how would a researcher acquire unpredictable driver behaviors and pedestrian movement so as not to endanger anyone? This is where SDG efforts make the most sense.

Why is this important for SimReady?

SimReady 3D art assets and SDG are designed to be used in concert to create and randomize an infinite variety of scenarios to meet specific training goals, and do it safely as a virtual simulation. The added benefit is that unlike real-world datasets that have to be manually annotated before use, SDG data is cleanly annotated by default and can be provided to train the AI models repeatedly until the researcher gets a predictable and consistent result from their algorithms.

Training Goals

That infinite variety of data means that a researcher can apply “domain randomization” across multiple possible facets. For instance, a researcher can quickly change weather and lighting conditions to see how a self-driving vehicle performs the same route in morning sun, foggy night and snowy whiteout conditions. Using SDG, a researcher can change the materials and wear patterns applied to barriers, floor markings and forklifts to ensure industrial robots can perform safely within a factory or busy warehouse. Other factors like human poses, partial occlusion of objects, distractors (objects that are designed to pull the attention) and physical properties can all be manipulated within the scenario editor so that a sufficient dataset can be produced to train a particular model.

SimReady 3D art assets are designed to support these kinds of SDG use-cases and when combined with semantic labeling, can help generate a multitude of scenarios for researchers.