Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Synthetic Data Augmentation #32

Open
mrabiabrn opened this issue Dec 4, 2024 · 1 comment
Open

About Synthetic Data Augmentation #32

mrabiabrn opened this issue Dec 4, 2024 · 1 comment

Comments

@mrabiabrn
Copy link

Hi,
Can you explain that how do you augment training using synthetic data in detail? (both for Panacea and Panacea+)
Here, https://github.com/wenyuqing/panacea/tree/main/metrics/StreamPETR, I see that you first train with only generated data and then finetune with the real data.
Was there a specific reason to do this like that instead of training with both real and synthetic data from scratch? Can you provide details.

@wenyuqing
Copy link
Owner

Hi, sorry for the late reply. We choose to first pre-train on generated data and then fine-tune on real data, rather than mixing them for joint training or pre-training on real data followed by fine-tuning on generated data. The reason is that generated samples often contain noise, which leads to a domain gap between real and synthetic data. As a result, it is not ideal to mix them for training. Furthermore, pre-training on real data followed by fine-tuning on generated data could lead to a negative impact due to the noise. Therefore, we believe that pre-training on synthetic data offers a better starting point for the model compared to training from scratch. The model can acquire some initial perception abilities from the generated samples. That's the reason why we choose such a manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants