Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

while trying dowhy_causal_prediction_demo.ipynb: TypeError: can't convert cuda:0 device type tensor to numpy #1306

Open
JPZ4-5 opened this issue Mar 20, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@JPZ4-5
Copy link

JPZ4-5 commented Mar 20, 2025

Describe the bug
When trying to run dowhy_causal_prediction_demo.ipynb, I encountered with this error when running the cell

trainer = pl.Trainer(devices=1, max_epochs=5) 

trainer.fit(algorithm, loaders['train_loaders'], loaders['val_loaders'])

:

You are using the plain ModelCheckpoint callback. Consider using LitModelCheckpoint which with seamless uploading to Model registry.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2]

  | Name  | Type       | Params | Mode 
---------------------------------------------
0 | model | Sequential | 306 K  | train
---------------------------------------------
306 K     Trainable params
0         Non-trainable params
306 K     Total params
1.226     Total estimated model params size (MB)
8         Modules in train mode
0         Modules in eval mode

Epoch 0:   0%
 0/312 [00:00<?, ?it/s]

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[15], line 3
      1 trainer = pl.Trainer(devices=1, max_epochs=5) 
----> 3 trainer.fit(algorithm, loaders['train_loaders'], loaders['val_loaders'])

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:561](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=560), in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    559 self.training = True
    560 self.should_stop = False
--> 561 call._call_and_handle_interrupt(
    562     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    563 )

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:48](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py#line=47), in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     46     if trainer.strategy.launcher is not None:
     47         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 48     return trainer_fn(*args, **kwargs)
     50 except _TunerExitException:
     51     _call_teardown_hook(trainer)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:599](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=598), in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    592     download_model_from_registry(ckpt_path, self)
    593 ckpt_path = self._checkpoint_connector._select_ckpt_path(
    594     self.state.fn,
    595     ckpt_path,
    596     model_provided=True,
    597     model_connected=self.lightning_module is not None,
    598 )
--> 599 self._run(model, ckpt_path=ckpt_path)
    601 assert self.state.stopped
    602 self.training = False

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:1012](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=1011), in Trainer._run(self, model, ckpt_path)
   1007 self._signal_connector.register_signal_handlers()
   1009 # ----------------------------
   1010 # RUN THE TRAINER
   1011 # ----------------------------
-> 1012 results = self._run_stage()
   1014 # ----------------------------
   1015 # POST-Training CLEAN UP
   1016 # ----------------------------
   1017 log.debug(f"{self.__class__.__name__}: trainer tearing down")

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py:1056](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/trainer.py#line=1055), in Trainer._run_stage(self)
   1054         self._run_sanity_check()
   1055     with torch.autograd.set_detect_anomaly(self._detect_anomaly):
-> 1056         self.fit_loop.run()
   1057     return None
   1058 raise RuntimeError(f"Unexpected state {self.state}")

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py:216](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py#line=215), in _FitLoop.run(self)
    214 try:
    215     self.on_advance_start()
--> 216     self.advance()
    217     self.on_advance_end()
    218 except StopIteration:

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py:455](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/fit_loop.py#line=454), in _FitLoop.advance(self)
    453 with self.trainer.profiler.profile("run_training_epoch"):
    454     assert self._data_fetcher is not None
--> 455     self.epoch_loop.run(self._data_fetcher)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py:150](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py#line=149), in _TrainingEpochLoop.run(self, data_fetcher)
    148 while not self.done:
    149     try:
--> 150         self.advance(data_fetcher)
    151         self.on_advance_end(data_fetcher)
    152     except StopIteration:

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py:320](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/training_epoch_loop.py#line=319), in _TrainingEpochLoop.advance(self, data_fetcher)
    317 with trainer.profiler.profile("run_training_batch"):
    318     if trainer.lightning_module.automatic_optimization:
    319         # in automatic optimization, there can only be one optimizer
--> 320         batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
    321     else:
    322         batch_output = self.manual_optimization.run(kwargs)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:192](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=191), in _AutomaticOptimization.run(self, optimizer, batch_idx, kwargs)
    185         closure()
    187 # ------------------------------
    188 # BACKWARD PASS
    189 # ------------------------------
    190 # gradient update with accumulated gradients
    191 else:
--> 192     self._optimizer_step(batch_idx, closure)
    194 result = closure.consume_result()
    195 if result.loss is None:

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:270](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=269), in _AutomaticOptimization._optimizer_step(self, batch_idx, train_step_and_backward_closure)
    267     self.optim_progress.optimizer.step.increment_ready()
    269 # model hook
--> 270 call._call_lightning_module_hook(
    271     trainer,
    272     "optimizer_step",
    273     trainer.current_epoch,
    274     batch_idx,
    275     optimizer,
    276     train_step_and_backward_closure,
    277 )
    279 if not should_accumulate:
    280     self.optim_progress.optimizer.step.increment_completed()

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:176](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py#line=175), in _call_lightning_module_hook(trainer, hook_name, pl_module, *args, **kwargs)
    173 pl_module._current_fx_name = hook_name
    175 with trainer.profiler.profile(f"[LightningModule]{pl_module.__class__.__name__}.{hook_name}"):
--> 176     output = fn(*args, **kwargs)
    178 # restore current_fx when nested context
    179 pl_module._current_fx_name = prev_fx_name

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/module.py:1302](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/module.py#line=1301), in LightningModule.optimizer_step(self, epoch, batch_idx, optimizer, optimizer_closure)
   1271 def optimizer_step(
   1272     self,
   1273     epoch: int,
   (...)
   1276     optimizer_closure: Optional[Callable[[], Any]] = None,
   1277 ) -> None:
   1278     r"""Override this method to adjust the default way the :class:`~pytorch_lightning.trainer.trainer.Trainer` calls
   1279     the optimizer.
   1280 
   (...)
   1300 
   1301     """
-> 1302     optimizer.step(closure=optimizer_closure)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/optimizer.py:154](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/core/optimizer.py#line=153), in LightningOptimizer.step(self, closure, **kwargs)
    151     raise MisconfigurationException("When `optimizer.step(closure)` is called, the closure should be callable")
    153 assert self._strategy is not None
--> 154 step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
    156 self._on_after_step()
    158 return step_output

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py:239](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py#line=238), in Strategy.optimizer_step(self, optimizer, closure, model, **kwargs)
    237 # TODO(fabric): remove assertion once strategy's optimizer_step typing is fixed
    238 assert isinstance(model, pl.LightningModule)
--> 239 return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py:123](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py#line=122), in Precision.optimizer_step(self, optimizer, model, closure, **kwargs)
    121 """Hook to run the optimizer step."""
    122 closure = partial(self._wrap_closure, model, optimizer, closure)
--> 123 return optimizer.step(closure=closure, **kwargs)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py:493](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py#line=492), in Optimizer.profile_hook_step.<locals>.wrapper(*args, **kwargs)
    488         else:
    489             raise RuntimeError(
    490                 f"{func} must return None or a tuple of (new_args, new_kwargs), but got {result}."
    491             )
--> 493 out = func(*args, **kwargs)
    494 self._optimizer_step_code()
    496 # call optimizer step post hooks

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py:91](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/optimizer.py#line=90), in _use_grad_for_differentiable.<locals>._use_grad(self, *args, **kwargs)
     89     torch.set_grad_enabled(self.defaults["differentiable"])
     90     torch._dynamo.graph_break()
---> 91     ret = func(self, *args, **kwargs)
     92 finally:
     93     torch._dynamo.graph_break()

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/adam.py:223](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/optim/adam.py#line=222), in Adam.step(self, closure)
    221 if closure is not None:
    222     with torch.enable_grad():
--> 223         loss = closure()
    225 for group in self.param_groups:
    226     params_with_grad: List[Tensor] = []

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py:109](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/plugins/precision/precision.py#line=108), in Precision._wrap_closure(self, model, optimizer, closure)
     96 def _wrap_closure(
     97     self,
     98     model: "pl.LightningModule",
     99     optimizer: Steppable,
    100     closure: Callable[[], Any],
    101 ) -> Any:
    102     """This double-closure allows makes sure the ``closure`` is executed before the ``on_before_optimizer_step``
    103     hook is called.
    104 
   (...)
    107 
    108     """
--> 109     closure_result = closure()
    110     self._after_closure(model, optimizer)
    111     return closure_result

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:146](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=145), in Closure.__call__(self, *args, **kwargs)
    144 @override
    145 def __call__(self, *args: Any, **kwargs: Any) -> Optional[Tensor]:
--> 146     self._result = self.closure(*args, **kwargs)
    147     return self._result.loss

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/utils/_contextlib.py:116](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/utils/_contextlib.py#line=115), in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:131](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=130), in Closure.closure(self, *args, **kwargs)
    128 @override
    129 @torch.enable_grad()
    130 def closure(self, *args: Any, **kwargs: Any) -> ClosureResult:
--> 131     step_output = self._step_fn()
    133     if step_output.closure_loss is None:
    134         self.warning_cache.warn("`training_step` returned `None`. If this was on purpose, ignore this warning...")

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py:319](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/loops/optimization/automatic.py#line=318), in _AutomaticOptimization._training_step(self, kwargs)
    308 """Performs the actual train step with the tied hooks.
    309 
    310 Args:
   (...)
    315 
    316 """
    317 trainer = self.trainer
--> 319 training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
    320 self.trainer.strategy.post_training_step()  # unused hook - call anyway for backward compatibility
    322 if training_step_output is None and trainer.world_size > 1:

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py:328](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/trainer/call.py#line=327), in _call_strategy_hook(trainer, hook_name, *args, **kwargs)
    325     return None
    327 with trainer.profiler.profile(f"[Strategy]{trainer.strategy.__class__.__name__}.{hook_name}"):
--> 328     output = fn(*args, **kwargs)
    330 # restore current_fx when nested context
    331 pl_module._current_fx_name = prev_fx_name

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py:391](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/pytorch_lightning/strategies/strategy.py#line=390), in Strategy.training_step(self, *args, **kwargs)
    389 if self.model != self.lightning_module:
    390     return self._forward_redirection(self.model, self.lightning_module, "training_step", *args, **kwargs)
--> 391 return self.lightning_module.training_step(*args, **kwargs)

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/cacm.py:110](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/cacm.py#line=109), in CACM.training_step(self, train_batch, batch_idx)
    108 # Acause regularization
    109 if attr_type == "causal":
--> 110     penalty_causal += self.CACMRegularizer.conditional_reg(
    111         classifs, [a[:, attr_type_idx] for a in attribute_labels], [targets], nmb, E_eq_A_attr
    112     )
    114 # Aconf regularization
    115 elif attr_type == "conf":

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/regularization.py:209](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/dowhy/causal_prediction/algorithms/regularization.py#line=208), in Regularizer.conditional_reg(self, classifs, attribute_labels, conditioning_subset, num_envs, E_eq_A)
    207 cumprod = torch.cumprod(cardinality, dim=0)
    208 n_groups = cumprod[-1].item()
--> 209 factors_np = np.concatenate(([1], cumprod[:-1]))
    210 factors = torch.from_numpy(factors_np)
    211 group_indices = grouping_data @ factors

File [~/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/_tensor.py:1194](http://127.0.0.1:41299/home/zby/miniforge3/envs/dowhy-py312/lib/python3.12/site-packages/torch/_tensor.py#line=1193), in Tensor.__array__(self, dtype)
   1192     return handle_torch_function(Tensor.__array__, (self,), self, dtype=dtype)
   1193 if dtype is None:
-> 1194     return self.numpy()
   1195 else:
   1196     return self.numpy().astype(dtype, copy=False)

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Steps to reproduce the behavior
run all cells in dowhy_causal_prediction_demo.ipynb

Expected behavior
The program runs normally and produces results.

Version information:

  • DoWhy: 0.12
  • pytorch_lightning: 2.5.1
  • torch: 2.6.0
  • python: 3.12.9

Additional context
There was no issue when running the ERM in the previous cell.

@JPZ4-5 JPZ4-5 added the bug Something isn't working label Mar 20, 2025
@JPZ4-5
Copy link
Author

JPZ4-5 commented Mar 21, 2025

Just provide a temporary solution.
Change lines 199-211 of dowhy/causal_prediction/algorithms/regularization.py to:

for i in range(num_envs):
    conditioning_subset_i = [subset_var[i] for subset_var in conditioning_subset]
    conditioning_subset_i_uniform = [
        ele.unsqueeze(1) if ele.dim() == 1 else ele for ele in conditioning_subset_i
    ]
    grouping_data = torch.cat(conditioning_subset_i_uniform, 1).to(device='cpu')
    assert grouping_data.min() >= 0, "Group numbers cannot be negative."
    cardinality = 1 + torch.max(grouping_data, dim=0)[0]
    cumprod = torch.cumprod(cardinality, dim=0)
    n_groups = cumprod[-1].item()
    factors_np = np.concatenate(([1], cumprod[:-1].cpu().numpy()))
    factors = torch.from_numpy(factors_np)
    group_indices = grouping_data @ factors

On my device, this works. The changes are .to(device='cpu') for grouping_data and cumprod[:-1].cpu().numpy().

To perform grouping_data @ factors, both variables should be on the same device, and on my CUDA device 2080Ti, there seems to be an error related to missing instruction sets; running it on the CPU works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant