Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use it in transformer? #1

Open
kpmokpmo opened this issue Sep 15, 2021 · 3 comments
Open

How to use it in transformer? #1

kpmokpmo opened this issue Sep 15, 2021 · 3 comments

Comments

@kpmokpmo
Copy link

Hi, thanks for your work.

Just several quick questions here:

  1. When embedding the S/Tnorm blocks into the transformer baseline, should I discard or keep the original layer/group norm?
  2. It seems that your paper and 'Data Normalization for Bilinear Structures in High-Frequency Financial Time-series' sort of similar. Just curious if there is any main difference I didn't noticed.

Thank you very much!

@JLDeng
Copy link
Owner

JLDeng commented Sep 15, 2021

Hi, thanks for you interest.

  1. According to my experience, you can keep the original layer, but it may depend on your task.
  2. Thanks for your suggestion. I have just checked this paper. I think the basic idea is similar. One of the major difference is that the normalized features should be combined with the original features and then fed to the following operations, otherwise the forecasting results would not be good.

@JLDeng JLDeng closed this as completed Sep 15, 2021
@JLDeng JLDeng reopened this Sep 15, 2021
@JLDeng
Copy link
Owner

JLDeng commented Sep 15, 2021

In addition, I notice that they only applied normalization on the input data. Our work demonstrated that this operation can be generalized to latent space.

@kpmokpmo
Copy link
Author

Thank you for quick reply! Well, I still want to double check about the design:
if the attention block has the following structure:

S+T norm & concat+conv
x = x + self.drop_path(self.attn(self.norm1(x)))
x = x + self.drop_path(self.mlp(self.norm2(x)))

I think at least self.norm1 plays a duplicated role as the S/T norm layer. Please correct me if I shouldn't insert the ST norm here at all. Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants