Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perceiver IO #479

Open
Dynamedion opened this issue Feb 9, 2025 · 0 comments
Open

Perceiver IO #479

Dynamedion opened this issue Feb 9, 2025 · 0 comments

Comments

@Dynamedion
Copy link

Dynamedion commented Feb 9, 2025

Not sure how to address this, but I am using the base PerceiverModel() (not pretrained) to train a bunch of IMU data and other dynamics, and modified the model for a regression task, such that it outputs via an nn.Linear with outputdim=3. I am also using cosine-based learning rate scheduler with warmup (found in another Github repo) that is updated at batch level, as well as nn_utils.clip_grad_norm_(reg_model.parameters(), max_norm=1.0) after the loss.backward() step, and before the optimizer.step().

I am seeing that the model's average RMSE loss on both training and validation sets gets to ~0.22-0.24 from the first 3 epochs, and after that it's small differences, eventually triggering early stopping. Please see attached graph. I have tried different batch sizes and tampered with the weight decay from the AdamW optimizer (but not extensively), yet it pretty much seems to converge on this loss level. I have even removed data that could be classified as outliers, as well as tried different train/test/val splits, and yet the losses remain in the same range.

Data have been normalized by loading all batches from the train_dataloader and obtaining their global min and max (this is intentional over standardization). On loading the batches in either train/val/test loops, I am using these global values to normalize the batch data before submitting them to the PerceiverRegressor. I have applied the same logic to the 3 output values, all of which therefore range between 0 and 1.

Image

class PerceiverRegressor(nn.Module):
    def __init__(self, base_model, d_latents, output_dim=3):
        super().__init__()
        self.base_model = base_model
        self.regression_out1 = nn.Linear(d_latents, output_dim)


    def forward(self, inputs):

        outputs = self.base_model(inputs)

        latents = outputs.last_hidden_state
        pooled_latents = latents.mean(dim=1)

        preds = self.regression_out1(pooled_latents)
        return preds

config = PerceiverConfig(
    num_latents=256,
    d_latents=255,
    num_labels=1,
    d_model=27,
    num_self_attention_heads=3,
    num_cross_attention_heads=3,
    problem_type='regression'
)

model = PerceiverModel(config=config)
reg_model = PerceiverRegressor(base_model=model, d_latents=config.d_latents, output_dim=3).to(device)

# Split dataset into train, validation, and test
total_dataset_size = len(dataset)
train_size = int(0.9 * total_dataset_size)
val_size = int(0.05 * total_dataset_size)
test_size = total_dataset_size - train_size - val_size

torch.manual_seed(2809)
train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size])

train_dataloader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=16, shuffle=False)
test_dataloader = DataLoader(test_dataset, batch_size=16, shuffle=False)

I would highly appreciate any help or insight on this! It must be something obvious that I am failing to notice perhaps, so your expert eyes could prove useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant