Batch consumed during symbolic build is not processed during training. #20394

nicolaspi · 2024-10-22T15:03:50Z

When training with steps_per_epoch, the dataset iterator is not reinitialized after the symbolic build of the model, leading to one batch being consumed outside the training loop.

import os
os.environ["KERAS_BACKEND"] = "tensorflow" # fails also on jax, torch
import numpy as np
import keras
from keras import Layer
print(keras.__version__)

x = np.ones((10, 4))
y = np.ones((10, 1))

class BatchCounter(Layer):
  def __init__(self):
      super().__init__()
      self.total = self.add_weight(
        shape=(),
        dtype='int32',
        initializer="zeros",
        trainable=False,
        name="total",
      )
  def call(self, inputs, training=None):
      if training == True:
        self.total.assign(self.total + 1)

      return inputs

def run_model(steps_per_epoch):
  input = keras.Input(shape=[4])
  counter_layer = BatchCounter()
  output = counter_layer(input)
  model = keras.Model(inputs=input,
                      outputs=output)

  model.compile(
      loss="mse",
      optimizer="adam"
  )

  epochs = 1
  model.fit(
      x=x,
      y=y,
      batch_size=5,
      steps_per_epoch=steps_per_epoch,
      epochs=epochs,
      verbose=0
  )
  print(f"Total batches seen during training with (steps_per_epochs={steps_per_epoch}): {counter_layer.total.numpy()}")

run_model(2)
run_model(None)

3.6.0.dev2024101603
Total batches seen during training with (steps_per_epochs=2): 1
Total batches seen during training with (steps_per_epochs=None): 2
/usr/local/lib/python3.10/dist-packages/keras/src/backend/jax/trainer.py:408: UserWarning: Your input ran out of data; interrupting epoch. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches. You may need to use the `.repeat()` function when building your dataset.
  for step, data in epoch_iterator.enumerate_epoch():

The text was updated successfully, but these errors were encountered:

Tensorflow trainer's train and test functions refactoring Fix keras-team#20394 Fix keras-team#20390 Fix keras-team#20344

mehtamansi29 · 2024-10-22T17:38:54Z

Hi @nicolaspi -

Thanks for reporting the issue. I am able to reproduce the issue on JAX and Torch backend only. For tensorflow backend I am getting Total batches seen during training with (steps_per_epochs=2):2 only. This is because steps_per_epoch also works with epochs while fitting the model.
And For torch, getting the issue because there is bit different way using dataloader can define steps per epoch. more details you can find here.

Attached gist here for the reference.

nicolaspi · 2024-10-22T20:14:55Z

Thanks for your insight, I proposed a PR to fix the issue.

google-ml-butler · 2024-10-22T23:36:54Z

Are you satisfied with the resolution of your issue?
Yes
No

google-ml-butler · 2024-10-22T23:36:57Z

Are you satisfied with the resolution of your issue?
Yes
No

github-actions bot assigned mehtamansi29 Oct 22, 2024

nicolaspi added a commit to nicolaspi/keras that referenced this issue Oct 22, 2024

EpochIterator refactoring

5365df8

Tensorflow trainer's train and test functions refactoring Fix keras-team#20394 Fix keras-team#20390 Fix keras-team#20344

mehtamansi29 added type:Bug backend:jax backend:torch stat:awaiting response from contributor labels Oct 22, 2024

nicolaspi mentioned this issue Oct 22, 2024

EpochIterator, TensorFlowTrainer : refactoring and fixes #20396

Merged

google-ml-butler bot removed the stat:awaiting response from contributor label Oct 22, 2024

fchollet closed this as completed in #20396 Oct 22, 2024

fchollet closed this as completed in 6b662d1 Oct 22, 2024

nicolaspi mentioned this issue Oct 24, 2024

EpochIterator initialized with a generator cannot be reset #20402

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch consumed during symbolic build is not processed during training. #20394

Batch consumed during symbolic build is not processed during training. #20394

nicolaspi commented Oct 22, 2024

mehtamansi29 commented Oct 22, 2024

nicolaspi commented Oct 22, 2024

google-ml-butler bot commented Oct 22, 2024

google-ml-butler bot commented Oct 22, 2024

Batch consumed during symbolic build is not processed during training. #20394

Batch consumed during symbolic build is not processed during training. #20394

Comments

nicolaspi commented Oct 22, 2024

mehtamansi29 commented Oct 22, 2024

nicolaspi commented Oct 22, 2024

google-ml-butler bot commented Oct 22, 2024

google-ml-butler bot commented Oct 22, 2024