Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial how_to_scale_op notebook #65

Merged
merged 4 commits into from
Aug 19, 2024
Merged

Conversation

DouglasOrr
Copy link
Collaborator

View notebook.

My thoughts for the blog post were to keep it pretty short, with some key figures & results, rather than repeating all of this there.

Feedback of any sort is very welcome - thanks in advance!

@DouglasOrr DouglasOrr self-assigned this Aug 14, 2024
Copy link
Contributor

@thecharlieblake thecharlieblake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great - thanks so much! A few things came to mind, though I appreciate the blog version might address some of it anyway:

  1. A bit more context for the un-initiated might be nice up-front - i.e. a para on what u-µP is for someone who's never heard of it
  2. I had never heard of hardtanh before this example - a sentence introducing it and why you chose it might help the reader
  3. Also a sentence explaining that our lib has these lovely scale_fwd and scale_bwd fns
  4. Generally I'm finding the maths a bit hard to follow - particularly the jump to E[y^2]. I think we might benefit from spelling this out quite a bit more explicitly, even if it is more verbose. I'm guessing most ML people won't have done much of this kind of analysis before

@DouglasOrr DouglasOrr force-pushed the how-to-scale-notebook branch from 3452219 to a91e833 Compare August 15, 2024 14:59
@thecharlieblake thecharlieblake merged commit a85d806 into main Aug 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants