Squeezeformer #1447

yygle · 2022-09-15T06:28:46Z

Develop Record

squeezeformer
├── attention.py                        # reltive multi-head attention module  
├── conv2d.py                           # self defined conv2d valid padding module
├── convolution.py                      # convolution module in squeezeformer block
├── encoder_layer.py                    # squeezeformer encoder layer
├── encoder.py                          # squeezeformer encoder class 
├── positionwise_feed_forward.py        # feed forward layer 
├── subsampling.py                      # sub-sampling layer, time reduction layer
└── utils.py                            # residual connection module

Implementation Details
- Squeezeformer Encoder
  - add pre layer norm before squeezeformer block
  - derive time reduction layer from tensorflow version
  - enable adaptive scale operation
  - enable init weights for deep model training
  - enable training config and results
  - enable dynamic chunk and JIT export
- Training
  - enable NoamHoldAnnealing schedular

robin1001 · 2022-09-15T10:37:14Z

wenet/squeezeformer/utils.py

+
+
+class ResidualModule(nn.Module):
+    """


You can move it to encoder.py, utils.py is not a proper name.

What if we just do it directly in forward function?

I moved residual connection to encoder.py and removed the util.py.

robin1001 · 2022-09-15T10:43:38Z

wenet/squeezeformer/convolution.py

@@ -0,0 +1,172 @@
+# Copyright (c) 2020 Mobvoi Inc. (authors: Binbin Zhang, Di Wu)


please note you should change the company and author.

Copyrights is added.

robin1001 · 2022-09-16T04:02:03Z

examples/librispeech/squeezeformer/local

@@ -0,0 +1 @@
+../s0/local


squeezeformer in examples/librispeech is not required here, we can just do it in s0 by configue since it shares same training and decoding recipe.

Okay, what about remove it after the whole README.md of squeezeformer part is updated?

xingchensong · 2022-10-18T09:46:32Z

examples/librispeech/s0/conf/train_squeezeformer_bidecoder_large.yaml

+
+# dataset related
+dataset_conf:
+    syncbn: true


与这个相关的代码似乎没有提上来？syncbn的转换似乎是可以在Train.py中调用torch api一键完成：

model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)

这里把他放到dataset_conf域是处于什么考量呢？

synbn不能直接在wenet中实现主要是因为数据不均衡带来的进程等待，完整的实现中我考虑了两种情况，1. 即DDP数据不分割(每个进程更新完整数据集)，以及2. 分割数据集，drop掉多余部分，因此在我这个版本实现中，将这个变量与数据集绑定在了一起。这个部分的代码因为与Squeezeformer的算法更新无关，属于工程优化范畴，因此会另提交PR更新。

ok，那我先合并，你继续优化和迭代。

robin1001 · 2022-10-18T13:50:47Z

Great job, looking forward to your future work and SOTA result.

xingchensong · 2022-10-18T14:36:35Z

wenet/squeezeformer/encoder.py

+        )
+        self.input_proj = nn.Sequential(
+            nn.Linear(
+                encoder_dim * (((input_size - 1) // 2 - 1) // 2), encoder_dim),


这个为啥不放到DepthwiseConv2dSubsampling4? 拍脑洞想以后也许可以有更激进的U-NET结构(比如从10ms直接到80ms，第5层后激进到160ms，最后一层再回到40ms OR 80ms) 此时需要DepthwiseConv2dSubsampling8，145行需要相应修改下采样输出维度，从这个角度看，input_proj最好耦合到subsample里。：）

forward_chunk没调用input_proj, 是想流式功能在后续PR中提交这里只是占位吗？

这个部分确实大部分功能和conv2d4是一致的，不合并是因为当时讨论到不要影响conformer的使用，（input_proj最好耦合到subsample里）我也觉得这个方式更加合理，后面会更新。
2.这个部分，包括流式的完整推理还不完善，近期更新

xingchensong · 2022-10-18T14:38:00Z

Great Job! THX!

kobenaxie · 2022-10-19T02:21:27Z

wenet/squeezeformer/encoder.py

+            if self.reduce_idx <= idx < self.recover_idx:
+                residual = xs
+                xs, chunk_masks, _, _ = layer(xs, chunk_masks, pos_emb, mask_pad)
+                xs += residual


这里的residual是不是和论文里的不一样？论文里好像只有reduce那一层和recover那一层进行residual连接。

这个部分可能需要再核实一下，如果仅在那两个部分有残差对网络效果是否有影响，这个部分我看到的不同版本实现稍有差别

kobenaxie · 2022-10-19T02:23:52Z

wenet/squeezeformer/encoder.py

+            xs = self.final_proj(xs)
+        return xs, masks
+
+    def forward_chunk(


reduce跟recover之间降采样那些层的在流式的时候，chunk折半，attention的缓存是不是也折半了（比如chunk_size = 16, num_left_chunks = 4) ？

非常好的建议！是的，这个地方需要折半，流式的功能还不完整，近期我更新上去。

因为流式的时候chunk要折半，是不是训练的时候也需要限制chunk须是偶数？还有就是reduce和recover之间的缓存我是和其他层分开的，相当于有三个缓存atten_cache, reduce_atten_cache以及conv_cache，期待更优雅的方式。

训练时候chunk为偶数的话，从训练和推理的一致性上看，确实更合理，但我目前的实验看下来貌似影响不大。
缓存这个部分确实需要3个tensor来存储，我也在构思这个部分，欢迎给出更好的解决方案。

使用外部padding+内部slicing的方式，应该可以把不同帧率层的cache合并到一起。

xmly and others added 7 commits September 15, 2022 12:56

[init] enable SqueezeformerEncoder

23c4f99

[update] enable Squeezeformer training

7c24031

[update] README.md

76ac435

fix formatting issues

b291d1e

fix formatting issues

6a36c23

fix formatting issues

139fc36

fix formatting issues

084c843

robin1001 reviewed Sep 15, 2022

View reviewed changes

yygle added 5 commits September 15, 2022 20:21

[update] change residual connection & add copyrights

1f4a8b3

fix formatting issues

0558fbb

[update] enlarge adaptive scale dimensions

0264049

fix formatting issues

be2f56e

fix adaptive scale bugs

ba6825c

robin1001 reviewed Sep 16, 2022

View reviewed changes

yygle added 15 commits September 20, 2022 19:58

[update] encoder.py(fix init weights bugs) and README.md

89f133e

[update] initialization for input projection

78b8077

fix formatting issues

76fbcf2

fix formatting issues

cefa4cd

[update] time reduction layer with conv1d and conv2d

c2f2a05

fix formatting issues

ba7ed74

[update] operators

027c85c

[update] experiment results & code format

ed342f2

[update] experiment results

ac4013c

[update] streaming support & results, dw_stride trigger

6592ae3

fix formatting issue

67e260a

fix formatting issue

5973352

fix formatting issue

08c49aa

fix formatting issue

d777305

[update] SqueezeFormer Large Results

cd82d89

yygle added 2 commits October 12, 2022 19:20

fix formatting issues

3c55dde

fix format issues

0824e56

xingchensong reviewed Oct 18, 2022

View reviewed changes

robin1001 approved these changes Oct 18, 2022

View reviewed changes

robin1001 merged commit bbf844a into wenet-e2e:main Oct 18, 2022

xingchensong reviewed Oct 18, 2022

View reviewed changes

robin1001 mentioned this pull request Oct 18, 2022

Squeezeformer implementation #1431

Closed

11 tasks

kobenaxie reviewed Oct 19, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Squeezeformer #1447

Squeezeformer #1447

yygle commented Sep 15, 2022

robin1001 Sep 15, 2022

robin1001 Sep 15, 2022

yygle Sep 15, 2022

robin1001 Sep 15, 2022

robin1001 Sep 15, 2022

yygle Sep 15, 2022

robin1001 Sep 16, 2022

yygle Sep 16, 2022

robin1001 Sep 16, 2022

xingchensong Oct 18, 2022

yygle Oct 18, 2022

robin1001 Oct 18, 2022

robin1001 commented Oct 18, 2022

xingchensong Oct 18, 2022

yygle Oct 20, 2022

xingchensong commented Oct 18, 2022

kobenaxie Oct 19, 2022

yygle Oct 20, 2022 •

edited

Loading

kobenaxie Oct 19, 2022

yygle Oct 20, 2022

kobenaxie Oct 20, 2022

yygle Oct 20, 2022

xingchensong Oct 23, 2022

		@@ -0,0 +1,172 @@
		# Copyright (c) 2020 Mobvoi Inc. (authors: Binbin Zhang, Di Wu)

		@@ -0,0 +1 @@
		../s0/local

Squeezeformer #1447

Squeezeformer #1447

Conversation

yygle commented Sep 15, 2022

Develop Record

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robin1001 commented Oct 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xingchensong commented Oct 18, 2022

Choose a reason for hiding this comment

yygle Oct 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yygle Oct 20, 2022 •

edited

Loading