Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module 'tensorflow_addons' has no attribute 'optimizers' (tfa-nightly) #2578

Closed
asapsmc opened this issue Sep 27, 2021 · 65 comments
Closed

module 'tensorflow_addons' has no attribute 'optimizers' (tfa-nightly) #2578

asapsmc opened this issue Sep 27, 2021 · 65 comments
Assignees

Comments

@asapsmc
Copy link

asapsmc commented Sep 27, 2021

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOs M1 (12.0 Beta Monterey)
  • TensorFlow version and how it was installed (source or binary): 2.5.0 (pip)
  • TensorFlow-Addons version and how it was installed (source or binary): tfa-nightly 0.15.0 (pip)
  • Python version: 3.8
  • Is GPU used? (yes/no): NA

Describe the bug

After installing from nightly version, I got an error module 'tensorflow_addons' has no attribute 'optimizers'

Code to reproduce the issue

import tensorflow_addons as tfa
...
radam = tfa.optimizers.RectifiedAdam(lr=cf["lr"], clipnorm=clipnorm)

Other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

@bhack
Copy link
Contributor

bhack commented Sep 27, 2021

/cc @szutenberg

@szutenberg
Copy link
Contributor

Hi @MR-T77

I tried to reproduce it:

  • tensorflow-cpu 2.5.0
  • tfa-nightly 0.15.0.dev20210922190150
  • Ubuntu 20.04.1 LTS

Everything seems to be ok.

Please check if you can run the following code:

import tensorflow_addons as tfa
print(tfa)
print(tfa.optimizers)
print(tfa.optimizers.RectifiedAdam)

What does it return?

If you're still getting an error then please attach the output from pip freeze.

@asapsmc
Copy link
Author

asapsmc commented Sep 29, 2021

Hi @szutenberg!
It seems something was broken with my conda environment (I tried so many things...). I uninstalled tensorflow-addons (0.14) and reinstalled tfa-nightly, and now I can import tfa without error. Just to be sure: whenever I update something on a conda environment, I immediately run code on top of it, I don't restart terminal or vscode (I'm not sure this is the best process or if I should restart something).
Nevertheless, although I can import addons, I cannot use them, I always get the following error (tried with other optimizers too but I get the same error):

Traceback (most recent call last):
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/machine/.vscode/extensions/ms-python.python-2021.9.1246542782/pythonFiles/lib/python/debugpy/__main__.py", line 45, in <module>
    cli.main()
  File "/Users/machine/.vscode/extensions/ms-python.python-2021.9.1246542782/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main
    run()
  File "/Users/machine/.vscode/extensions/ms-python.python-2021.9.1246542782/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/machine/Projects/finetune-asp/src/ISMIR2020_v2.py", line 471, in <module>
    main()
  File "/Users/machine/Projects/finetune-asp/src/ISMIR2020_v2.py", line 466, in main
    new_train(dataset, 'TCNv2', cpu=False, addons=True)
  File "/Users/machine/Projects/finetune-asp/src/ISMIR2020_v2.py", line 329, in new_train
    history = model.fit(train, steps_per_epoch=len(train), epochs=cf["num_epochs"], shuffle=True,
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/keras/engine/training.py", line 1183, in fit
    tmp_logs = self.train_function(iterator)
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
    result = self._call(*args, **kwds)
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3023, in __call__
    return graph_function._call_flat(
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 1960, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 591, in call
    outputs = execute.execute(
  File "/Users/machine/miniforge3/envs/tf/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot assign a device for operation model/conv_1_3x3_conv/Conv2D/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node model/conv_1_3x3_conv/Conv2D/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Equal: CPU 
AssignSubVariableOp: GPU CPU 
AssignVariableOp: GPU CPU 
GreaterEqual: GPU CPU 
FloorDiv: CPU 
Sqrt: GPU CPU 
NoOp: GPU CPU 
Pow: GPU CPU 
Mul: CPU 
Cast: GPU CPU 
Identity: GPU CPU 
SelectV2: GPU CPU 
ReadVariableOp: GPU CPU 
RealDiv: GPU CPU 
Sub: GPU CPU 
AddV2: GPU CPU 
Const: GPU CPU 
Square: GPU CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  model_conv_1_3x3_conv_conv2d_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_mul_5_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_mul_8_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_sub_10_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  model/conv_1_3x3_conv/Conv2D/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/Identity (Identity) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_5 (Cast) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Pow (Pow) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Pow_1 (Pow) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_1/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_1 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_2/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_2 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_3/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_3 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_1/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_1 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_2/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_2 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_4/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_4 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_1 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_2 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_5 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_6/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_6 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_7/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_7 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_3 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_8/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_8 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_3 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_9/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_9 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_4 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_4 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_5 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Sqrt (Sqrt) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/GreaterEqual (GreaterEqual) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Const (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_5/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/mul_5 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_6 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_1 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_1 (ReadVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_7 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_8/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/mul_8 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Square (Square) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_9 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_2 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_1 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_2 (ReadVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_10 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Sqrt_1 (Sqrt) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_11 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_3 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_6 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_12 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignSubVariableOp (AssignSubVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_4/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_4 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_7/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_7 (Cast) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_8/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_4 (ReadVariableOp) 
  Lookahead/Lookahead/update/sub_10/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/sub_10 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_13 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_5 (ReadVariableOp) 
  Lookahead/Lookahead/update/add_5 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/floordiv (FloorDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_14 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Equal (Equal) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2_1/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/SelectV2_1 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_2 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2_2/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/SelectV2_2 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_3 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps_1 (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps_2 (NoOp) /job:localhost/replica:0/task:0/device:GPU:0

         [[{{node model/conv_1_3x3_conv/Conv2D/ReadVariableOp}}]] [Op:__inference_train_function_30042]  

@bhack
Copy link
Contributor

bhack commented Sep 29, 2021

/cc @lgeiger Can you replicate this on your M1?

@szutenberg
Copy link
Contributor

@MR-T77 It looks that there are no issues with TFA but there are missing GPU kernels which breaks collocation.

You need to check what types do you use with Equal, Mul and FloorDiv. You can dump graphs (TF_DUMP_GRAPH_PREFIX + turn on vlog) and check it in pbtxt file.

@asapsmc
Copy link
Author

asapsmc commented Sep 30, 2021

@szutenberg I'm a newbie on this type of things, sorry. Could you please point me to more detailed description of what I need to do? Thanks in advance

@asapsmc
Copy link
Author

asapsmc commented Sep 30, 2021

Also, I'm not using any custom code here. It's a simple BLSTM (3-layer keras Bidirectional(simpleRNN) with a dense output)

@szutenberg
Copy link
Contributor

@MR-T77 maybe the easiest would be to provide the code so that we can reproduce it locally.

İf you want to debug it on your own then set TF_CPP_MAX_VLOG_LEVEL to 10, and TF_DUMP_GRAPH_PREFİX=tmp. You should see tmp dir after running the script. Reading placer_input.pbtxt will answer to my question (simply grep -A 20 -rn FloorDiv). You'll see dtypes.

Anyway I'm traveling now so I'm not able to help you further until ~12th October.

@asapsmc
Copy link
Author

asapsmc commented Sep 30, 2021

@szutenberg thank you so much for your availability to help! I'll try to understand what's wrong until 12th October. If I'm unsuccessful, I'll contact again.

@asapsmc
Copy link
Author

asapsmc commented Sep 30, 2021

@szutenberg anyways, I leave here the code I'm using, just in case it's a easy thing you can spot on:

import os
import pickle
import numpy as np
from tensorflow.keras.utils import Sequence
from tensorflow.keras.layers import (Dense, Input)
from tensorflow.keras.layers import SimpleRNN, Bidirectional, Masking, LSTM  # For BLSTM
from tensorflow.keras.models import Sequential, Model
import madmom

import tensorflow.keras.backend as K
import tensorflow as tf

import tensorflow_addons as tfa

from modules.utils import PKL_PATH

tf.config.set_soft_device_placement(True)
# GENERAL CONSTANTS
FPS = 100  # set the frame rate as FPS frames per second
MASK_VALUE = -1

lr = 0.05
num_epochs = 50


class Fold(object):

    def __init__(self, folds, fold):
        self.folds = folds
        self.fold = fold

    @property
    def test(self):
        # fold N for testing
        return np.unique(self.folds[self.fold])

    @property
    def val(self):
        # fold N+1 for validation
        return np.unique(self.folds[(self.fold + 1) % len(self.folds)])

    @property
    def train(self):
        # all remaining folds for training
        train = np.hstack(self.folds)
        train = np.setdiff1d(train, self.val)
        train = np.setdiff1d(train, self.test)
        return train


class DataSequence_BLSTM(Sequence):

    mask_value = -999  # only needed for batch sizes > 1

    def __init__(self, x, y, batch_size=1, max_seq_length=None, fps=FPS):
        self.x = x
        self.y = [madmom.utils.quantize_events(o, fps=fps, length=len(d))
                  for o, d in zip(y, self.x)]
        self.batch_size = batch_size
        self.max_seq_length = max_seq_length

    def __len__(self):
        return int(np.ceil(len(self.x) / float(self.batch_size)))

    def __getitem__(self, idx):
        # determine which sequence(s) to use
        x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
        y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
        # pad them if needed
        if self.batch_size > 1:
            x = tf.keras.preprocessing.sequence.pad_sequences(
                x, maxlen=self.max_seq_length, dtype=np.float32, truncating='post', value=self.mask_value)
            y = tf.keras.preprocessing.sequence.pad_sequences(
                y, maxlen=self.max_seq_length, dtype=np.int32, truncating='post', value=self.mask_value)
        return np.array(x), np.array(y)[..., np.newaxis]


def simple_BLSTM(dataset, cpu=False):
    train_db = pickle.load(open('%s/%s.pkl' % (PKL_PATH, dataset), 'rb'))
    num_fold = 0
    fold = Fold(train_db.folds, num_fold)
    train = DataSequence_BLSTM([train_db.x[i] for i in fold.train],
                               [train_db.annotations[i] for i in fold.train],
                               batch_size=1, max_seq_length=60 * FPS)
    val = DataSequence_BLSTM([train_db.x[i] for i in fold.val],
                             [train_db.annotations[i] for i in fold.val],
                             batch_size=1, max_seq_length=60 * FPS)
    input_layer = Input((None, train[0][0].shape[-1]))
    masked = Masking(mask_value=-999)(input_layer)
    blstm_1 = Bidirectional(SimpleRNN(units=25, return_sequences=True))(masked)
    blstm_2 = Bidirectional(SimpleRNN(units=25, return_sequences=True))(blstm_1)
    blstm_3 = Bidirectional(SimpleRNN(units=25, return_sequences=True))(blstm_2)
    output_layer = Dense(1, name='output', activation='sigmoid')(blstm_3)
    model = Model(input_layer, output_layer)
    radam = tfa.optimizers.RectifiedAdam(lr=lr, clipnorm=0.5)
    ranger = tfa.optimizers.Lookahead(radam, sync_period=6, slow_step_size=0.5)
    model.compile(optimizer=ranger, loss=K.binary_crossentropy, metrics=['binary_accuracy'])
    history = model.fit(train, steps_per_epoch=len(train), epochs=num_epochs, shuffle=True,
                        validation_data=val, validation_steps=len(val),
                        verbose=True)
    return True


def main():
    tf.config.set_soft_device_placement(True)
    dataset = "traintest_smallsmc"
    simple_BLSTM(dataset)


if __name__ == "__main__":

    main()

@bhack
Copy link
Contributor

bhack commented Sep 30, 2021

@bhack
Copy link
Contributor

bhack commented Sep 30, 2021

@szutenberg Could it be a side effect of your introduced var.device?

@asapsmc
Copy link
Author

asapsmc commented Sep 30, 2021

Have you already tried with https://www.tensorflow.org/api_docs/python/tf/config/set_soft_device_placement ?

Yes, it's there in the code: "tf.config.set_soft_device_placement(True)"

@bhack
Copy link
Contributor

bhack commented Sep 30, 2021

@MR-T77 Yes sorry I missed, is that your code is not well formatted.

@szutenberg
Copy link
Contributor

Hi @MR-T77

I'm back. Have you solved the issue?

Today I tried to reproduce your problem and unfortunately, the code requires pickle which is not attached. I created dummy training data and everything works fine - FloorDiv (T=INT64) is placed on GPU.

Graphs don't contain name "conv_1_3x3_conv" so probably the code I got did not produce the error message you attached.

Please provide full code and all required files together with frozen pip list (pip freeze).

@asapsmc
Copy link
Author

asapsmc commented Oct 14, 2021

Hi @szutenberg!
Unfortunately no, I have not solved the issue, although I have tried everything I could.
But right now, and with this short snippet (from 30-Set) the behaviour is different than the initial:
Now, it seems that the train starts, but just stalls after appearing "Epoch 1/50".
I send you the missing pickle here attached, as well as the frozen pip list and a export of "conda list --explicit".
traintest_smallsmc.pkl.zip
condaenv.txt
pipfreeze.txt

@asapsmc
Copy link
Author

asapsmc commented Oct 14, 2021

And to be complete, in my original code (the one with conv layers) I am getting the same error as initially exposed ("Cannot assign a divide for operation..."). Nevertheless, if I replace the optimizer by the simple keras.optimizers.Adam, I can train the model.
Here I attach the code
problem_TCN.py.zip
Thanks in advance.

@szutenberg
Copy link
Contributor

Hi @MR-T77 ,

I'm sorry but it seems that one more file is missing: definitions.py
Traceback (most recent call last): File "problem_TCN.py", line 183, in <module> simple_TCN(dataset) File "problem_TCN.py", line 135, in simple_TCN train_db = pickle.load(open('%s.pkl' % (dataset), 'rb')) ModuleNotFoundError: No module named 'definitions'

Could you make sure that it reproduces on google colab?

@asapsmc
Copy link
Author

asapsmc commented Oct 18, 2021

Hi @szutenberg,
I'm so sorry for that, the pickle file was saved in a module definitions.py, that's why it is requesting that file, although it does not need it.
I re-saved the pickle in "main", and I attach it as well as a more complete problem_TCN.py.
[traintest_smallsmc.pkl.zip](https://github.com/tensorflow/addons/files/7363832/traintest_smallsmc.pkl.zip
problem_TCN.py.zip
)

@szutenberg
Copy link
Contributor

Hi @MR-T77 ,

Unfortunately now I have another error: AttributeError: Can't get attribute 'Dataset' on <module '__main__' from 'problem_TCN.py'>

Could you please prepare google colab which demonstrates your problem? Note that you don't need to provide real data - dummy training data is enough: you just need to make sure that shapes match.

Thanks!

@bhack
Copy link
Contributor

bhack commented Oct 18, 2021

Yes a Colab with dummy input data is the best thing to share so we could verify if it is something only related to your MacOs M1 platform or a more general issue.

@asapsmc
Copy link
Author

asapsmc commented Oct 19, 2021

Hi @szutenberg and @bhack : sorry for my late reply but I was trying to generate dummy data, but I couldn't do it without further errors (I'm a newbie).
I really hope with this Google Colab you can test everything (otherwise, please instruct me). You'll just have to upload the pkl file into your runtime.
In Colab I don't have errors, I can run this exact code.
But I'm using a whole set of different libraries (e.g. no tf-metal) and versions (tf, tf-addons).

@szutenberg
Copy link
Contributor

Hi @MR-T77

I took the code from colab and was able to run it in my virtual env (Ubuntu 20.04):
tensorflow-gpu 2.5.0
tfa-nightly 0.15.0.dev20210922190150

All ops Equal are placed on GPU and everything works fine.

@asapsmc
Copy link
Author

asapsmc commented Oct 19, 2021

Hi @szutenberg,

So, do you think it is some clash between libraries in my environment or other reason?

@bhack
Copy link
Contributor

bhack commented Oct 19, 2021

@lgeiger Can you try to see if you can reproduce this on your M1 ?

@asapsmc
Copy link
Author

asapsmc commented Oct 20, 2021

I just want to clarify that I can run this exact code with no problems if I use tf.keras.Adam. If I start using tf-addons optimizers (e.g. Radam) I get the above errors.
Following @szutenberg request, I already sent the pip freeze list and paste it here again. Do you think any of these libraries/versions could be causing this clash with tf-addons?
pipfreeze.txt

@bhack
Copy link
Contributor

bhack commented Oct 20, 2021

@asapsmc
Copy link
Author

asapsmc commented Oct 21, 2021

Hi @bhack , I did as you said and got this:

warnings.warn( args_0: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0 GeneratorDataset: (GeneratorDataset): /job:localhost/replica:0/task:0/device:CPU:0 NoOp: (NoOp): /job:localhost/replica:0/task:0/device:CPU:0 Identity: (Identity): /job:localhost/replica:0/task:0/device:CPU:0 FakeSink0: (Identity): /job:localhost/replica:0/task:0/device:CPU:0 identity_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0 args_0: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0 GeneratorDataset: (GeneratorDataset): /job:localhost/replica:0/task:0/device:CPU:0 NoOp: (NoOp): /job:localhost/replica:0/task:0/device:CPU:0 Identity: (Identity): /job:localhost/replica:0/task:0/device:CPU:0 FakeSink0: (Identity): /job:localhost/replica:0/task:0/device:CPU:0 identity_RetVal: (_Retval): /job:localhost/replica:0/task:0/device:CPU:0 Epoch 1/50 assignvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 AssignVariableOp: (AssignVariableOp): /job:localhost/replica:0/task:0/device:GPU:0 iter/Initializer/zeros: (Const): /job:localhost/replica:0/task:0/device:GPU:0 iterator: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0 iterator_1: (_Arg): /job:localhost/replica:0/task:0/device:CPU:0 model_conv_1_convolution_conv2d_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_conv_1_convolution_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_conv_2_convolution_conv2d_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_conv_2_convolution_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_conv_3_convolution_conv2d_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_conv_3_convolution_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_2_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_2_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_2_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_2_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_4_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_4_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_4_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_4_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_8_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_8_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_8_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_8_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_16_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_16_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_16_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_16_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_32_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_32_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_32_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_32_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_64_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_64_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_64_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_64_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_128_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_128_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_128_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_128_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_256_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_256_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_256_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_256_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_512_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_512_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_512_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_512_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1024_dilated_conv_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1024_dilated_conv_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1024_conv_1x1_conv1d_expanddims_1_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_tcn_1024_conv_1x1_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_output_tensordot_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 model_output_biasadd_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 assignaddvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 assignaddvariableop_1_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_cast_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_cast_2_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_cast_3_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_cast_4_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_cast_6_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_1_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_1_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_1_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_2_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_2_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_2_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_3_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_3_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_3_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_4_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_4_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_4_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_5_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_5_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_5_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_6_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_6_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_6_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_7_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_7_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_7_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_8_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_8_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_8_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_9_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_9_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_9_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_10_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_10_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_10_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_11_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_11_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_11_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_12_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_12_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_12_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_13_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_13_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_13_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_14_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_14_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_14_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_15_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_15_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_15_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_16_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_16_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_16_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_17_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_17_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_17_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_18_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_18_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_18_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_19_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_19_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_19_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_20_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_20_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_20_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_21_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_21_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_21_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_22_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_22_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_22_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_23_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_23_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_23_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_24_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_24_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_24_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_25_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_25_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_25_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_26_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_26_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_26_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_27_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_27_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_27_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_28_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_28_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_28_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_29_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_29_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_29_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_30_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_30_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_30_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_31_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_31_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_31_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_32_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_32_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_32_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_33_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_33_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_33_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_34_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_34_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_34_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_35_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_35_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_35_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_36_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_36_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_36_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_37_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_37_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_37_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_38_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_38_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_38_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_39_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_39_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_39_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_40_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_40_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_40_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_41_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_41_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_41_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_42_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_42_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_42_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_43_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_43_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_43_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_44_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_44_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_44_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_45_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_45_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_45_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_46_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_46_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_46_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_47_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_47_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_47_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_48_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_48_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_48_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_49_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_49_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_49_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_50_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_50_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_50_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_51_mul_5_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_51_mul_8_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 lookahead_lookahead_update_51_sub_10_readvariableop_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 assignaddvariableop_2_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 assignaddvariableop_3_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 assignaddvariableop_4_resource: (_Arg): /job:localhost/replica:0/task:0/device:GPU:0 IteratorGetNext: (IteratorGetNext): /job:localhost/replica:0/task:0/device:CPU:0

After this I get the error message:

Exception has occurred: InvalidArgumentError       (note: full exception trace is shown but execution is paused at: <module>)
Cannot assign a device for operation model/conv_1_convolution/Conv2D/ReadVariableOp: Could not satisfy explicit device specification '' because the node {{colocation_node model/conv_1_convolution/Conv2D/ReadVariableOp}} was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/device:GPU:0'. All available devices [/job:localhost/replica:0/task:0/device:CPU:0, /job:localhost/replica:0/task:0/device:GPU:0]. 
Colocation Debug Info:
Colocation group had the following types and supported devices: 
Root Member(assigned_device_name_index_=2 requested_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' assigned_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' resource_device_name_='/job:localhost/replica:0/task:0/device:GPU:0' supported_device_types_=[CPU] possible_devices_=[]
Equal: CPU 
AssignSubVariableOp: GPU CPU 
AssignVariableOp: GPU CPU 
GreaterEqual: GPU CPU 
FloorDiv: CPU 
Sqrt: GPU CPU 
NoOp: GPU CPU 
Pow: GPU CPU 
Mul: CPU 
Cast: GPU CPU 
Identity: GPU CPU 
SelectV2: GPU CPU 
ReadVariableOp: GPU CPU 
RealDiv: GPU CPU 
Sub: GPU CPU 
AddV2: GPU CPU 
Const: GPU CPU 
Square: GPU CPU 
_Arg: GPU CPU 

Colocation members, user-requested devices, and framework assigned devices, if any:
  model_conv_1_convolution_conv2d_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_mul_5_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_mul_8_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  lookahead_lookahead_update_sub_10_readvariableop_resource (_Arg)  framework assigned device=/job:localhost/replica:0/task:0/device:GPU:0
  model/conv_1_convolution/Conv2D/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/Identity (Identity) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_5 (Cast) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Pow (Pow) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Pow_1 (Pow) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_1/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_1 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_2/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_2 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_3/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_3 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_1/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_1 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_2/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_2 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_4/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_4 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_1 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_2 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_5 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_6/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_6 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_7/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_7 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_3 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_8/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_8 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_3 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_9/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/sub_9 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_4 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_4 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_5 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Sqrt (Sqrt) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/GreaterEqual (GreaterEqual) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Const (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_5/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/mul_5 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_6 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_1 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_1 (ReadVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_7 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_8/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/mul_8 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Square (Square) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_9 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_2 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_1 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_2 (ReadVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_10 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Sqrt_1 (Sqrt) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_11 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_3 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/truediv_6 (RealDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_12 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignSubVariableOp (AssignSubVariableOp)/job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_4/y (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/add_4 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_7/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_7 (Cast) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Cast_8/x (Const) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_4 (ReadVariableOp) 
  Lookahead/Lookahead/update/sub_10/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/sub_10 (Sub) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_13 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/ReadVariableOp_5 (ReadVariableOp) 
  Lookahead/Lookahead/update/add_5 (AddV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/floordiv (FloorDiv) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/mul_14 (Mul) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/Equal (Equal) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2_1/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/SelectV2_1 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_2 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/SelectV2_2/ReadVariableOp (ReadVariableOp) 
  Lookahead/Lookahead/update/SelectV2_2 (SelectV2) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/AssignVariableOp_3 (AssignVariableOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps_1 (NoOp) /job:localhost/replica:0/task:0/device:GPU:0
  Lookahead/Lookahead/update/group_deps_2 (NoOp) /job:localhost/replica:0/task:0/device:GPU:0

	 [[{{node model/conv_1_convolution/Conv2D/ReadVariableOp}}]] [Op:__inference_train_function_15035]

@lgeiger
Copy link
Contributor

lgeiger commented Oct 25, 2021

Do you have tensorflow-metal installed? If so, could you try uninstalling it and if not could you try installing it? Just to make sure that this has nothing to do with some ops not being supported by metal.

@asapsmc
Copy link
Author

asapsmc commented Oct 25, 2021

@lgeiger: you are right. When I uninstalled tensorflow-metal, I stopped getting the error. But of course, everything starts running only in CPU.
Do you think this is my only option, ie, running everything on CPU?

@lgeiger
Copy link
Contributor

lgeiger commented Oct 25, 2021

Do you think this is my only option, ie, running everything on CPU?

For now, unfortunately yes. It seems like some operation is not yet supported by the metal device, but I am not sure if the TFA optimizer could be rewritten to either not use this op or to not require it to be placed on the same device as the other devices in the group.

@asapsmc
Copy link
Author

asapsmc commented Oct 25, 2021

Thank you for your answer. Nevertheless, given the inability of Apple to provide support for developers, I hope you find some ingenious solution on your side.

@bhack
Copy link
Contributor

bhack commented Oct 25, 2021

The official Apple support is at https://developer.apple.com/forums/tags/tensorflow-metal

@asapsmc
Copy link
Author

asapsmc commented Oct 25, 2021

@bhack I know, I've been trying but they just don't give support.

@bhack
Copy link
Contributor

bhack commented Oct 25, 2021

What is your thread there?

@bhack
Copy link
Contributor

bhack commented Oct 25, 2021

If is this one https://developer.apple.com/forums/thread/692818 I suppose that just 4 days old with Saturday and Sunday it isn't too much long as waiting time.

@asapsmc
Copy link
Author

asapsmc commented Oct 25, 2021

That's not mine. But you can check this one (very similar to my problem), which was posted 3 months ago. I've got other 2 threads (my user is the same, so you can search by that) posted almost 1 month ago, also not solved by Apple. Besides this, I also submitted the issue through Feedback Assistant, but I have not any feedback.
So, 1 month or even 3 months seem more than sufficient time for a company like Apple to solve these issues, or at least to provide some feedback.
Wouldn't you agree?

@bhack
Copy link
Contributor

bhack commented Oct 25, 2021

tensorflow-metal is a closed source plugin package so we don't have too much alternative solutions other that asking to Apple.

@asapsmc
Copy link
Author

asapsmc commented Oct 25, 2021

@bhack I understand that and I sincerely thank you for your help (which I requested because I thought it was something related to tensorflow-addons). If I could change my computer, I'd also achieve a solution, but I can't.
Your support has been awesome (as opposed to Apple support) and it allowed me to identify the source of the problem.

@bhack bhack added the macos label Oct 25, 2021
@bhack
Copy link
Contributor

bhack commented Oct 25, 2021

Have you tried with tensorflow-macos 2.6?

Edit:
We are going to have a release with #2583

@asapsmc
Copy link
Author

asapsmc commented Oct 25, 2021

Yes, I've been trying with 2.6.

@bhack
Copy link
Contributor

bhack commented Oct 25, 2021

Ok I'am going to close this.. Please add a comment later if you have any news..

@bhack bhack closed this as completed Oct 25, 2021
@asapsmc
Copy link
Author

asapsmc commented May 27, 2022

Ok, I'm back to this after a while.
So, I've been getting spurious errors while doing model.fit with the Lookahead optimizer (I'm doing fine-tuning with big datasets, and my code just breaks while fitting to different files, and in a not-reproducible way, i.e. each time I run it it breaks on a different file, and on different operations).
I can see that these errors are undoubtedly related to the Lookahead optimizer.
Let me try to explain this new info in a clear manner.
I've tried with 2 different versions of tf+tfaddons (conda environments), but I got the same type of errors, probably more frequent with the pylast conda environment:

  • pylast:tensorflow-macos 2.9.0, tensorflow-metal 0.5.0, tensorflow-addons 0.17.0
  • py39deps26-source: tensorflow-macos 2.6.0, tensorflow-metal 0.2.0, tensorflow-addons 0.15.0.dev0

The base code is always the same, I use tf.config.set_soft_device_placement(True) and also with tf.device('/cpu:0'): in every call to tensorflow, otherwise I get errors. As explained before, in my code, I just load a model, and fine-tune it to each file of a dataset.

Here are a pair of example error outputs (obtained with the pylast conda environment):

File "/Users/machine/Projects/finetune-asp/src/finetune_IMR2020.py", line 138, in finetune_dataset_db
    history = model.fit(ft, steps_per_epoch=len(ft), epochs=ft_cfg["num_epochs"], shuffle=True,
  File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node 'Lookahead/Lookahead/update_64/mul_11' defined at (most recent call last):
    
    File "/Users/machine/Projects/finetune-asp/src/finetune_IMR2020.py", line 138, in finetune_dataset_db
      history = model.fit(ft, steps_per_epoch=len(ft), epochs=ft_cfg["num_epochs"], shuffle=True,
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1409, in fit
      tmp_logs = self.train_function(iterator)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1051, in train_function
      return step_function(self, iterator)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1040, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1030, in run_step
      outputs = model.train_step(data)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 893, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 539, in minimize
      return self.apply_gradients(grads_and_vars, name=name)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow_addons/optimizers/lookahead.py", line 104, in apply_gradients
      return super().apply_gradients(grads_and_vars, name, **kwargs)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 678, in apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 723, in _distributed_apply
      update_op = distribution.extended.update(
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 706, in apply_grad_to_update_var
      update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow_addons/optimizers/lookahead.py", line 130, in _resource_apply_dense
      train_op = self._optimizer._resource_apply_dense(
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow_addons/optimizers/rectified_adam.py", line 249, in _resource_apply_dense
      coef["r_t"] * m_corr_t / (v_corr_t + coef["epsilon_t"]),
Node: 'Lookahead/Lookahead/update_64/mul_11'
Incompatible shapes: [0] vs. [5,40,20]
	 [[{{node Lookahead/Lookahead/update_64/mul_11}}]] [Op:__inference_train_function_30821]   

and

File "/Users/machine/Projects/finetune-asp/src/finetune_IMR2020.py", line 138, in finetune_dataset_db
    history = model.fit(ft, steps_per_epoch=len(ft), epochs=ft_cfg["num_epochs"], shuffle=True,
  File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

Detected at node 'Lookahead/Lookahead/update_26/mul_11' defined at (most recent call last):

    File "/Users/machine/Projects/finetune-asp/src/finetune_IMR2020.py", line 138, in finetune_dataset_db
      history = model.fit(ft, steps_per_epoch=len(ft), epochs=ft_cfg["num_epochs"], shuffle=True,
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
      return fn(*args, **kwargs)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1409, in fit
      tmp_logs = self.train_function(iterator)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1051, in train_function
      return step_function(self, iterator)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1040, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 1030, in run_step
      outputs = model.train_step(data)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/engine/training.py", line 893, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 539, in minimize
      return self.apply_gradients(grads_and_vars, name=name)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow_addons/optimizers/lookahead.py", line 104, in apply_gradients
      return super().apply_gradients(grads_and_vars, name, **kwargs)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 678, in apply_gradients
      return tf.__internal__.distribute.interim.maybe_merge_call(
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 723, in _distributed_apply
      update_op = distribution.extended.update(
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/keras/optimizers/optimizer_v2/optimizer_v2.py", line 706, in apply_grad_to_update_var
      update_op = self._resource_apply_dense(grad, var, **apply_kwargs)
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow_addons/optimizers/lookahead.py", line 130, in _resource_apply_dense
      train_op = self._optimizer._resource_apply_dense(
    File "/Users/machine/miniforge3/envs/pylast/lib/python3.9/site-packages/tensorflow_addons/optimizers/rectified_adam.py", line 249, in _resource_apply_dense
      coef["r_t"] * m_corr_t / (v_corr_t + coef["epsilon_t"]),
Node: 'Lookahead/Lookahead/update_26/mul_11'
Incompatible shapes: [0] vs. [1,40,20]
	 [[{{node Lookahead/Lookahead/update_26/mul_11}}]] [Op:__inference_train_function_1406468]

@bhack
Copy link
Contributor

bhack commented May 27, 2022

@MR-T77 Just to be sure that it is reproducible on an environment under control with linux can you test the same with Docker + TFA pip:

https://www.tensorflow.org/install/docker

So that we could exclude that is related only to tensorflow-macos.

@bhack bhack reopened this May 27, 2022
@asapsmc
Copy link
Author

asapsmc commented May 27, 2022

@MR-T77 Just to be sure that it is reproducible on an environment under control with linux can you test the same with Docker + TFA pip:

https://www.tensorflow.org/install/docker

So that we could exclude that is related only to tensorflow-macos.

I'm afraid I can't really help. Installed Docker and tried to pull and run latest tensorflow and got this error
WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine. qemu: uncaught target signal 6 (Aborted) - core dumped

@bhack
Copy link
Contributor

bhack commented May 27, 2022

Yes probably they still not have added avx emulation support in QEMU for m1 or it is a Docker specific issue on m1 like docker/for-mac#6111.

If you can isolate the case and reproduce your error with a minimal code (e.g. like a test) we could try to run it on linux.

As tensorflow-macos and tensorflow-metal are closed source packages we cannot do anything here in the case we cannot reproduce the issue on another platform.

@asapsmc
Copy link
Author

asapsmc commented May 27, 2022

Well, I was just trying to provide further details, to see if it would help.
The code I sent in October generates this bug, but it seems it only happens when using Lookahead in M1.

@bhack
Copy link
Contributor

bhack commented May 27, 2022

Ok try to post also on https://developer.apple.com/forums/tags/tensorflow-metal

@asapsmc
Copy link
Author

asapsmc commented May 31, 2022

I submitted it to https://developer.apple.com/forums/thread/706952

@asapsmc
Copy link
Author

asapsmc commented Jun 23, 2022

Just to keep this info updated: I have a github repo with a stripped down version of my code (with the needed audio data) that does reproduce the issue (in a Mac M1). I also shared this with Apple, but unfortunately, they're not very responsive.

@asapsmc
Copy link
Author

asapsmc commented Jun 23, 2022

@MR-T77 maybe the easiest would be to provide the code so that we can reproduce it locally.

İf you want to debug it on your own then set TF_CPP_MAX_VLOG_LEVEL to 10, and TF_DUMP_GRAPH_PREFİX=tmp. You should see tmp dir after running the script. Reading placer_input.pbtxt will answer to my question (simply grep -A 20 -rn FloorDiv). You'll see dtypes.

Anyway I'm traveling now so I'm not able to help you further until ~12th October.

Hi again, I'm back at this as the problem remains (and it's even more frequent after I updated the conda environment), and Apple Developer Forums are not responding.
I was trying to generate some debug info (to see if I could figure out some way out of this), but I can't generate the pbtxt file.

I'm using this code:

from network_definitions import *
import tensorflow as tf


os.environ["TF_CPP_MIN_LOG_LEVEL"] = "10"
os.environ["TF_DUMP_GRAPH_PREFIX"] = 'tbdump'
#os.environ["XLA_FLAGS"] = "--xla_dump_to=/tbdump/generated"
tf.debugging.set_log_device_placement(False)
tf.config.set_soft_device_placement(True)
tf.debugging.experimental.enable_dump_debug_info(
    './tbdump',
    tensor_debug_mode="FULL_HEALTH",
    circular_buffer_size=-1)


if __name__ == "__main__":

    big_dataset = 'gtzan'
    small_dataset = 'traintest_smallsmc'
    dataset = big_dataset
    # data_aug='NODAUG'# to run without data augmentation
    finetune_db(dataset, data_aug='DAUG', load_pkl=True)

but after starting tensorboard --logdir /tbdump and accessing tensorboard on localhost, I always get the message "Debugger V2 is inactive because no data is available."

I can see that in tbdump folder there are being created the following type of files, but no pbtxt file:

  • tfdbg_events.xxx...xxx.graphs
  • tfdbg_events.xxx...xxx.source_files
  • tfdbg_events.xxx...xxx.execution
  • tfdbg_events.xxx...xxx.stack_frames
  • tfdbg_events.xxx...xxx.graph_execution_traces
  • tfdbg_events.xxx...xxx.metadata

Any idea on how to make this work?
This code and the needed data (as well as the pip freeze info is available at this github repo

@seanpmorgan
Copy link
Member

TensorFlow Addons is transitioning to a minimal maintenance and release mode. New features will not be added to this repository. For more information, please see our public messaging on this decision:
TensorFlow Addons Wind Down

Please consider sending feature requests / contributions to other repositories in the TF community with a similar charters to TFA:
Keras
Keras-CV
Keras-NLP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants