Synthetic Data Generation

Synthetic Data Generation (SDG) is the process by which a researcher can create completely artificial, but accurately annotated datasets to use as the baseline for training AI algorithms. SDG datasets are often produced as an alternative to capturing and measuring similar kinds of data in the real-world.

In many cases, capturing real-world data can be difficult, impractical or dangerous. Consider the variety of driving scenarios needed to teach an autonomous vehicle how to navigate roads safely. If all training data had to be captured from the real-world, how would a researcher acquire unpredictable driver behaviors and pedestrian movement so as not to endanger anyone? This is where SDG efforts make the most sense.

Why is this important for SimReady?

SimReady 3D assets and SDG are designed to be used in concert to create and randomize an infinite variety of scenarios to meet specific training goals, and do it safely as a virtual simulation. The added benefit is that unlike real-world datasets that have to be manually annotated before use, SDG data is cleanly annotated by default (using semantic labeling) and can be provided to train the AI models repeatedly until you get a predictable and consistent result from your algorithms.

Training Goals

That infinite variety of data means that you can apply “domain randomization” across multiple possible facets. For instance, you can quickly change both weather and lighting conditions to see how a self-driving vehicle performs the same route in morning sun, foggy night and snowy whiteout conditions. Using SDG, you can change the materials and wear patterns applied to barriers, floor markings and forklifts to ensure industrial robots can perform safely within a factory or busy warehouse. Other factors like human poses, partial occlusion of objects, distractors (objects that are designed to pull the attention of the viewer) and physical properties can all be manipulated within a scenario editor so that a sufficient dataset can be produced to train a particular model.

SimReady 3D assets are designed to support these kinds of SDG use-cases and when combined with semantic labeling, can help generate a multitude of computer vision scenarios for you.