About Synthetic Data Augmentation #32

mrabiabrn · 2024-12-04T11:53:13Z

Hi,
Can you explain that how do you augment training using synthetic data in detail? (both for Panacea and Panacea+)
Here, https://github.com/wenyuqing/panacea/tree/main/metrics/StreamPETR, I see that you first train with only generated data and then finetune with the real data.
Was there a specific reason to do this like that instead of training with both real and synthetic data from scratch? Can you provide details.

wenyuqing · 2024-12-18T06:40:59Z

Hi, sorry for the late reply. We choose to first pre-train on generated data and then fine-tune on real data, rather than mixing them for joint training or pre-training on real data followed by fine-tuning on generated data. The reason is that generated samples often contain noise, which leads to a domain gap between real and synthetic data. As a result, it is not ideal to mix them for training. Furthermore, pre-training on real data followed by fine-tuning on generated data could lead to a negative impact due to the noise. Therefore, we believe that pre-training on synthetic data offers a better starting point for the model compared to training from scratch. The model can acquire some initial perception abilities from the generated samples. That's the reason why we choose such a manner.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Synthetic Data Augmentation #32

About Synthetic Data Augmentation #32

mrabiabrn commented Dec 4, 2024

wenyuqing commented Dec 18, 2024

About Synthetic Data Augmentation #32

About Synthetic Data Augmentation #32

Comments

mrabiabrn commented Dec 4, 2024

wenyuqing commented Dec 18, 2024