-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync the whole Meg-LM fused_kernels sub-tree #260
Conversation
this is not good, the performance is worse and then it OOMed after 4 iterations:
it's possible I missed some other changes outside of these folders. Probably need to do it properly and track and replay each change |
and attn_batches % 4 == 0 # np * b must be divisor of 4 | ||
): | ||
if 0 <= sk <= 2048: | ||
if 0 <= sk <= 4096: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That test becomes useless no? unless we need to test sq
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just copied the code from Megatron-LM verbatim and only re-added back any changes we added.
i.e. I haven't added any code of my own. I only changed the outdated comment to match 4096
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to use a smaller model and restart the testing, as it was on brink of OOM already. So it's very likely the issues I'm seeing are unrelated.
I will report back when I get new numbers.
OK, I was testing with a broken merge of the deepspeed branch which introduced a memory leak. Found the issue now and will re-test anew with this and your PRs. |
OK, after finding an issue elsewhere this PR works just fine. Except it makes no difference whatsoever to the outcome. Perhaps we aren't impacted since we don't hit the constraints that weren't done well originally. I haven't investigated. But the numbers are telling:
the small fluctuation is fine - they are identical throughputs. |
Tested it some more w/ and w/o this change and they track very close |
As flagged by @thomasw21 in #259 - in we have only synced part of the fused_kernels fixes applied to Megatron-LM here #151.
I tried to track all the changes since then, but there are too many and often are mixed with other unrelated PRs, so how about we just sync the whole folder and other related files.
this PR is trying just that.
I have no idea how to track all the individual contributors across many PRs, but I think it was primarily @hyunwoongko so it should be easy to push him in as a contributor:
and it will be so once this is squash-merged.