Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NameError: name 'DeepSpeedCPUAdam' is not defined #488

Closed
lipiji opened this issue Oct 27, 2020 · 3 comments
Closed

NameError: name 'DeepSpeedCPUAdam' is not defined #488

lipiji opened this issue Oct 27, 2020 · 3 comments

Comments

@lipiji
Copy link

lipiji commented Oct 27, 2020

I have installed the cpu-adam, but still met the following issue:

Successfully installed deepspeed-0.3.0+d720fdb
Removed build tracker: '/tmp/pip-req-tracker-iqs3vipd'
[SUCCESS] deepspeed successfully imported.
[INFO] torch install path: ['/dockerdata/xx/anaconda3/lib/python3.7/site-packages/torch']
[INFO] torch version: 1.5.1+cu101, torch.cuda: 10.1
[INFO] deepspeed install path: ['/dockerdata/xx/anaconda3/lib/python3.7/site-packages/deepspeed']
[INFO] deepspeed info: 0.3.0+d720fdb, d720fdb, master
[SUCCESS] apex extensions successfully installed
[INFO] using new-style apex
[SUCCESS] fused lamb successfully installed.
[SUCCESS] transformer kernels successfully installed.
[WARNING] sparse attention is NOT installed.
[SUCCESS] cpu-adam (used by ZeRO-offload) successfully installed.
Installation is successful

=======

And when I run this script:
sh ./scripts/ds_zero-offload_10B_pretrain_gpt2_model_parallel.sh

...python3.7/site-packages/deepspeed/runtime/zero/stage2.py", line 161, in init
and type(init_optimizer) == DeepSpeedCPUAdam)
NameError: name 'DeepSpeedCPUAdam' is not defined
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Optimizer = DeepSpeedCPUAdam
Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'>
[2020-10-28 00:06:32,670] [INFO] [engine.py:613:_configure_zero_optimizer] Creating fp16 ZeRO stage 2 optimizer

@lipiji
Copy link
Author

lipiji commented Oct 28, 2020

@lipiji lipiji closed this as completed Oct 28, 2020
@lipiji lipiji reopened this Oct 28, 2020
@jeffra
Copy link
Collaborator

jeffra commented Oct 28, 2020

Hi @lipiji, so sorry you're running into this issue. We have since fixed this issue but the PR (#476) isn't merged yet (it should be merged by tomorrow). Feel free to try it out before it's merged if you'd like. Check out the notes at the top of the PR there are a lot of changes coming in the install process.

@lipiji lipiji closed this as completed Oct 29, 2020
@linehammer
Copy link

Python knows the purposes of certain names (ex. built-in functions ). Other names are defined within the program (ex. variables). If Python encounters a name that it doesn't recognize, you'll probably get NameError: global name 'xx' is not defined error. In most cases, this error is triggered when Python sees a variable name (Global or Local) and doesn't know what it's for. These errors can happen if you forget to initialize a variable , if you misspell a variable, or if you misspell a reserved word such as "True". Before you use the global variable in your function for reading, it must be first initialized somewhere: either outside of the function or inside it.

jeffra added a commit that referenced this issue Apr 11, 2023
* Merge chatgpt v2 to v3 - finalized (#484)

* [squash] staging chatgpt v1 (#463)

Co-authored-by: Reza Yazdani <[email protected]>
Co-authored-by: yaozhewei <[email protected]>
Co-authored-by: Tunji Ruwase <[email protected]>

* [partial] formatting fixes

* quantizer fixes

* fix for bert tests

* formatting fixes

* re-enable _param_slice_mappings in z2

* Enable the QKV requires_grad when in training mode (#466)

Co-authored-by: Jeff Rasley <[email protected]>

* fixes for attention enable_training flag

* commit to trigger CI

* fix for distil-bert param

* fixes for training context errors

* remove reza's qkv-optimization (#469)

Co-authored-by: Jeff Rasley <[email protected]>

* Chatgpt - Fuse lora params at HybridEngine (#472)

Co-authored-by: Jeff Rasley <[email protected]>

* add option to enable non-pin mode (#473)

* Chatgpt - fuse lora non pinned case (#474)

* Fix fuse/unfuse lora for Z3 and non-pinned parameter

* unfuse_lora_weight for non-pinned case

* fix the multiple issue for lora parameters

* formatting

* fuse lora only when available

---------

Co-authored-by: Jeff Rasley <[email protected]>

* Chatgpt/release inference cache (#475)

* Fix fuse/unfuse lora for Z3 and non-pinned parameter

* unfuse_lora_weight for non-pinned case

* release/retake the inference cache after/before generate

* remove duplicated _fuse_lora function

* fix formatting

* fix hybrid-engine config issue

* update formatting

* Chatgpt - fuse qkv v2 (#478)

Co-authored-by: Jeff Rasley <[email protected]>

* ChatGPT: Refactor Hybrid Engine Config (#477)

Co-authored-by: Lok Chand Koppaka <[email protected]>

* Inference Workspace Tweaks (#481)

* Safety checks around inference workspace allocation, extra flushing

* Formatting fixes

* Merge fix

* Chatgpt/inference tp (#480)

* Update the merged-QKV weights only if there is difference with the model parameter

* remove the hard-coded size

* always reset qkv params to updated ones after running step

* Add the infernce-tp group and tensor sharding to run inference in model-parallel mode

* optimize the gather/mp-sharding part

* Add hybrid_engine changes

* fix config issue

* Formatting fixes. Reset_qkv duplicate removal.

* fix bloom container.

* fix format.

---------

Co-authored-by: Ammar Ahmad Awan <[email protected]>
Co-authored-by: Lok Chand Koppaka <[email protected]>

* fix formatting

* more clean-up

---------

Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: yaozhewei <[email protected]>
Co-authored-by: Tunji Ruwase <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Michael Wyatt <[email protected]>
Co-authored-by: Lok Chand Koppaka <[email protected]>
Co-authored-by: Connor Holmes <[email protected]>
Co-authored-by: Ammar Ahmad Awan <[email protected]>

* fix a bug on lora-fusion (#487)

* Cholmes/v3 workspace bugfixes (#488)

* Miscellaneous workspace fixes, new config param

* Fix typo

---------

Co-authored-by: Reza Yazdani <[email protected]>
Co-authored-by: Jeff Rasley <[email protected]>
Co-authored-by: yaozhewei <[email protected]>
Co-authored-by: Tunji Ruwase <[email protected]>
Co-authored-by: Masahiro Tanaka <[email protected]>
Co-authored-by: Michael Wyatt <[email protected]>
Co-authored-by: Lok Chand Koppaka <[email protected]>
Co-authored-by: Connor Holmes <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants