Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

预训练数据问题 #13

Open
lumiere-ml opened this issue Mar 18, 2024 · 2 comments
Open

预训练数据问题 #13

lumiere-ml opened this issue Mar 18, 2024 · 2 comments

Comments

@lumiere-ml
Copy link

想问下,sky数据集很大,整体下载有500G左右,麻烦是否能介绍下,模型训练用了哪些数据,总共多少tokens?

@jiahe7ay
Copy link
Owner

下载了前20个

@lumiere-ml
Copy link
Author

这样会不会导致数据有偏之类的,请问下选择前20个和随机20个 影响大不

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants