Can't deepcopy an xformer model with triton 2 update #290

jramapuram · 2022-05-01T03:25:21Z

🐛 Bug

Given the recent triton2 update an xformer model cannot be deep-copied. This is an important requirement for numerous tasks including EMA (without knowledge of the generating class / hyper-params).

xformers ViT-B Config

reversible: False
block_type: "encoder"
num_layers: 12
dim_model: 768
layer_norm_style: "pre"

multi_head_config:
  num_heads: 12
  residual_dropout: 0.1  # (1) tried without this, (2) swapping this for DropPath, (3) with regular dropout
  use_rotary_embeddings: False

  attention:
    name: "scaled_dot_product"
    dropout: 0.0
    causal: False

feedforward_config:
  name: "MLP"
  dropout: 0.0
  activation: "gelu"
  hidden_layer_multiplier: 4

To reproduce

from copy import deepcopy

with open(transfomer_config_file, "rb") as fileptr:
    self.model_config = yaml.load(fileptr, Loader=yaml.FullLoader)

model = xFormer.from_config(xFormerConfig([self.model_config]))
deepcopy(model)

Error is:

TypeError: cannot pickle 'PyCapsule' object

blefaudeux · 2022-05-01T03:31:15Z

Oh jeez.. OK, you can remove triton from your env and this should unlock short term, I'll have a look in the meantime :)

jramapuram · 2022-05-02T13:13:25Z

For ref even the SWA impl in vanilla pytorch relies on deepcopy: https://github.com/pytorch/pytorch/blob/master/torch/optim/swa_utils.py#L100 -- not being able to EMA is a ~1-2% loss in overall perf 😬

blefaudeux · 2022-05-24T01:14:35Z

it should be fine with the attached PR @jramapuram, if you see other issues it's easy to fix and augment the unit test, basically in that case lazy initializing the triton parts fixes that

blefaudeux · 2022-05-24T01:15:16Z

sorry for the delay, I should have seen that before

* Tentatively fixing pickling issues, lazy init

jramapuram · 2022-05-25T19:54:18Z

Awesome @blefaudeux ! Testing now, thanks 🙏

blefaudeux · 2022-05-25T20:01:09Z

Awesome @blefaudeux ! Testing now, thanks pray

let me know if another part fails this, I should be able to fix in a similar fashion. And I'm still on #219, trying to come up with a repro less expensive than full blown IN. My current lead hypothesis is to provide means to handle various inits out of the box, right now deepnorm sets the distribution to a scaled uniform init, and it's probably not the best for all problems. It can always be done from the outside, but it kind of negates the benefit of having deepnorm out of the box

jramapuram mentioned this issue May 1, 2022

xformers ViT-B ImageNet MAE + Deepnorm training instability #219

Open

blefaudeux added the ongoing label May 2, 2022

blefaudeux added a commit that referenced this issue May 24, 2022

issue #290

2536cbb

blefaudeux added a commit that referenced this issue May 24, 2022

issue #290

97aa679

blefaudeux mentioned this issue May 24, 2022

[fix] Pickling issues with xformer models (#290) #309

Merged

6 tasks

blefaudeux added a commit that referenced this issue May 24, 2022

issue #290

04d51fc

blefaudeux added a commit that referenced this issue May 24, 2022

issue #290

a287463

blefaudeux added a commit that referenced this issue May 24, 2022

issue #290

4e632df

blefaudeux self-assigned this May 24, 2022

blefaudeux closed this as completed in #309 May 25, 2022

blefaudeux added a commit that referenced this issue May 25, 2022

[fix] Pickling issues with xformer models (#290) (#309)

8a2ef26

* Tentatively fixing pickling issues, lazy init

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't deepcopy an xformer model with triton 2 update #290

Can't deepcopy an xformer model with triton 2 update #290

jramapuram commented May 1, 2022

blefaudeux commented May 1, 2022

jramapuram commented May 2, 2022

blefaudeux commented May 24, 2022

blefaudeux commented May 24, 2022

jramapuram commented May 25, 2022

blefaudeux commented May 25, 2022

Can't deepcopy an xformer model with triton 2 update #290

Can't deepcopy an xformer model with triton 2 update #290

Comments

jramapuram commented May 1, 2022

🐛 Bug

xformers ViT-B Config

To reproduce

blefaudeux commented May 1, 2022

jramapuram commented May 2, 2022

blefaudeux commented May 24, 2022

blefaudeux commented May 24, 2022

jramapuram commented May 25, 2022

blefaudeux commented May 25, 2022