Perceiver IO #479

Dynamedion · 2025-02-09T11:50:24Z

Not sure how to address this, but I am using the base PerceiverModel() (not pretrained) to train a bunch of IMU data and other dynamics, and modified the model for a regression task, such that it outputs via an nn.Linear with outputdim=3. I am also using cosine-based learning rate scheduler with warmup (found in another Github repo) that is updated at batch level, as well as nn_utils.clip_grad_norm_(reg_model.parameters(), max_norm=1.0) after the loss.backward() step, and before the optimizer.step().

I am seeing that the model's average RMSE loss on both training and validation sets gets to ~0.22-0.24 from the first 3 epochs, and after that it's small differences, eventually triggering early stopping. Please see attached graph. I have tried different batch sizes and tampered with the weight decay from the AdamW optimizer (but not extensively), yet it pretty much seems to converge on this loss level. I have even removed data that could be classified as outliers, as well as tried different train/test/val splits, and yet the losses remain in the same range.

Data have been normalized by loading all batches from the train_dataloader and obtaining their global min and max (this is intentional over standardization). On loading the batches in either train/val/test loops, I am using these global values to normalize the batch data before submitting them to the PerceiverRegressor. I have applied the same logic to the 3 output values, all of which therefore range between 0 and 1.

class PerceiverRegressor(nn.Module):
    def __init__(self, base_model, d_latents, output_dim=3):
        super().__init__()
        self.base_model = base_model
        self.regression_out1 = nn.Linear(d_latents, output_dim)


    def forward(self, inputs):

        outputs = self.base_model(inputs)

        latents = outputs.last_hidden_state
        pooled_latents = latents.mean(dim=1)

        preds = self.regression_out1(pooled_latents)
        return preds

config = PerceiverConfig(
    num_latents=256,
    d_latents=255,
    num_labels=1,
    d_model=27,
    num_self_attention_heads=3,
    num_cross_attention_heads=3,
    problem_type='regression'
)

model = PerceiverModel(config=config)
reg_model = PerceiverRegressor(base_model=model, d_latents=config.d_latents, output_dim=3).to(device)

# Split dataset into train, validation, and test
total_dataset_size = len(dataset)
train_size = int(0.9 * total_dataset_size)
val_size = int(0.05 * total_dataset_size)
test_size = total_dataset_size - train_size - val_size

torch.manual_seed(2809)
train_dataset, val_dataset, test_dataset = random_split(dataset, [train_size, val_size, test_size])

train_dataloader = DataLoader(train_dataset, batch_size=16, shuffle=True)
val_dataloader = DataLoader(val_dataset, batch_size=16, shuffle=False)
test_dataloader = DataLoader(test_dataset, batch_size=16, shuffle=False)

I would highly appreciate any help or insight on this! It must be something obvious that I am failing to notice perhaps, so your expert eyes could prove useful.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perceiver IO #479

Perceiver IO #479

Dynamedion commented Feb 9, 2025 •

edited

Loading

Perceiver IO #479

Perceiver IO #479

Comments

Dynamedion commented Feb 9, 2025 • edited Loading

Dynamedion commented Feb 9, 2025 •

edited

Loading