-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
大哥 麻烦给个数据样本参考一下 #5
Comments
"text":xxxxx<Im_end>xxxx (最长为512)im_end来区分两个文本,我是尽量填充到最大长度的 |
就是 例如一篇文章,我怎么把这篇文章处理成可以训练模型的数据,代码我没太看懂 |
这句我没看懂是为什么 |
为啥要转np.arrary啊 |
如果词表大小小于 65535 用uint16存储,节省磁盘空间,否则用uint32存储 |
哦哦 其实 input_batch = [] input_batch.append(input_ids)类似,指定数据类型会节省磁盘空间 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
大哥 麻烦给个数据样本参考一下,我想了解一下 数据处理部分
The text was updated successfully, but these errors were encountered: