Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

Implement RedPajama as a NeoX Specialization #204

Merged
merged 15 commits into from
May 14, 2023

Conversation

danforbes
Copy link
Contributor

@danforbes danforbes changed the title Implement Red Pajama as a NeoX Specialization Implement RedPajama as a NeoX Specialization May 10, 2023
@philpax
Copy link
Collaborator

philpax commented May 10, 2023

Sorry about asking this again 😅 Are the changes between NeoX and RedPajama small enough to be covered by use_parallel_residual, or are there other changes? Because if it's just use_parallel_residual, we should figure out how to specify parameters for the model.

@danforbes
Copy link
Contributor Author

Are the changes between NeoX and RedPajama small enough to be covered by use_parallel_residual

I think that may be possibly true, but I'm not sure if I understand how to use that intuition to create a better implementation.

@philpax
Copy link
Collaborator

philpax commented May 12, 2023

My changes move use_parallel_residual to the model's hyperparameters - now the problem to figure out is how to actually set those hyperparameters before/during load. (Changing them manually confirms that both implementations work.)

Will sleep on it 💤

@philpax philpax merged commit 4139e0e into rustformers:main May 14, 2023
@danforbes danforbes deleted the dfo/neox/share branch May 14, 2023 08:00
@hhamud hhamud mentioned this pull request Aug 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants