About "Init tensors using type_as" #2585

ShomyLiu · 2020-07-11T16:00:02Z

As described in the doc: https://pytorch-lightning.readthedocs.io/en/stable/multi_gpu.html#init-tensors-using-type-as

When you need to create a new tensor, use type_as. This will make your code scale to any arbitrary number of GPUs or TPUs with Lightning.

# with lightning
def forward(self, x):
    z = torch.Tensor(2, 3)
    z = z.type_as(x, device=self.device)

However, it would be not convenient when there is no proper x , for example:

class Model(pl.LightningModule):
    def __init__(self, opt):
        super().__init__()
        self.opt = opt
        self.net = Net(opt)

    def forward(self, x):
        return self.net(x)
    def training_step(self, batch, batch_id):
        x, labels = batch
        ### labels: a tuple (0,0,0,....1,0)
        ### to(self.device) is right
        labels = torch.LongTensor(labels).to(self.device)  
        output = self(x)
        loss = F.cross_entropy(output, labels)
        return {"loss": loss}

class Net(nn.Module):
    def __init__(self, opt):

        super().__init__()
        self.opt = opt
        self.init_bert()
    def forward(self, x):
        x = self.get_bert(x)
        return x
    def init_bert(self):
        self.bert_tokenizer = AutoTokenizer.from_pretrained(self.opt.bert_path)
        self.bert = AutoModel.from_pretrained(self.opt.bert_path)
    def get_bert(self, sentence_lists):
        sentence_lists = [' '.join(x) for x in sentence_lists]
        ids = self.bert_tokenizer(sentence_lists, padding=True, return_tensors="pt")
        ###
        #  how to put the ids['input_ids'] into the right device ???
        #  .to(self.bert.device) is right
        #  .to(self.device) is wrong
        ###
        inputs = ids['input_ids'].to(self.bert.device)
        print("**************")
        print(self.device)    
        print(self.bert.device)
        print("**************")
        embeddings = self.bert(inputs)
        return embeddings[0]

The problems have been marked in the code.
My confusion is that when I run the code using multi-gpu:
(1) why "labels = torch.LongTensor(labels).to(self.device) " is right while "inputs = ids['input_ids'].to(self.device)" is wrong.
In other words, in the multi-gpu running, the self.device is not consitent with self.bert.device.

(2) For the above case, are there any other ways to send the ids['input_ids'] into the right device in the multi-gpus running, rather than inputs = ids['input_ids'].to(self.bert.device) since the doc of lightning requires Delete .cuda() or .to() calls

The text was updated successfully, but these errors were encountered:

github-actions · 2020-07-11T16:01:02Z

Hi! thanks for your contribution!, great first issue!

rohitgr7 · 2020-07-11T20:12:24Z

It will work since self(x) is called withing training_step and then self.forward is called within the same function and then get_bert will be called, so all of these functions will be called on the same device within the same process. You can simply use self.device to transfer tensors to the device, it won't break. Just a suggestion, you should tokenize these within a Dataset __getitem__ method rather than in forward and return the tokenized outputs to be used in forward directly from the dataset.

Just curious, does self.bert.device gives you anything? AFAIK PyTorch model doesn't have a device attribute.

awaelchli · 2020-07-11T23:14:57Z

In the first part of the code, if labels is a tensor, it should already be on the right device, since it is passed in to the training_step.
I am also curious why what self.bert.device gives you and why you claim in the comment # .to(self.device) is wrong

ShomyLiu · 2020-07-12T00:16:21Z

Thanks for the promote replies and suggestions. This is a toy experiment to try lightning, and it indeed a nice framework~

First, the bert of AutoModel in hugginggface transformers has a property function device which could get torch.device from module. (Ref: https://huggingface.co/transformers/_modules/transformers/modeling_utils.html#PreTrainedModel). The device in self.bert should be always the right device that can reflect which device the model is in.
@awaelchli @rohitgr7

The code could run successfully after adding the self.device into the self.net(x, self.device).
I might find the reason: firstly I registered the device in the __init__ while the device by lightning is setting in the training_step.

So in summary, tensors.to(self.device) is one of the correct approaches to put tensors into devices in lightning besides using type_as.

rohitgr7 · 2020-07-12T10:54:05Z

type_as has nothing to do with the device. It's used just to map tensors to a dtype. If you just pass a device too then it will do .to(self.device) for you so that you don't have to do it again explicitly yourself.

ShomyLiu · 2020-07-12T12:02:57Z

Yeah, thanks very much.
My intention is when there are no appropriate variables to conduct type_as shown in the documents, it is necessary to pass the device to(self.device) explicitly to put a new tensor into the right device.

awaelchli · 2020-07-12T16:19:54Z

Yes, that's correct. If it is the only way, then you need to do that.
The documentation is an advice for best practice and not a fixed rule you need to follow to use Lightning. So what you are doing is fine.
Does this clarification resolve the issue?

ShomyLiu · 2020-07-13T00:15:18Z

Yes! Thanks for your suggestions.

nsarang · 2020-07-22T19:32:34Z

@awaelchli
It seems that PyTorch is going to deprecate type_as for changing devices. Have a look at:
type_as() method change device too #33662.
They've already replaced it in the codes:
remove uses of type() and type_as() part 1. #38029

Perhaps we shouldn't encourage its use in the documentation.

awaelchli · 2020-07-22T20:33:32Z

@nsarang Didn't know. Are you interested in making these doc updates?

yipliu · 2022-09-06T06:35:45Z

I also think there is a need to give a way to fit PL, because type_as is really not working well

awaelchli · 2022-09-06T09:59:19Z

@yipliu I opened #14554 to check if we can make the documentation better. As mentioned before, contributions there are also welcomed 😃

ShomyLiu closed this as completed Jul 13, 2020

ShomyLiu mentioned this issue Jul 13, 2020

Memory leaks: process still remain in the back if even the code is finished. #2590

Closed

awaelchli mentioned this issue Sep 6, 2022

Replace mentions of .type_as() in our docs #14554

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About "Init tensors using type_as" #2585

About "Init tensors using type_as" #2585

ShomyLiu commented Jul 11, 2020 •

edited

Loading

github-actions bot commented Jul 11, 2020

rohitgr7 commented Jul 11, 2020

awaelchli commented Jul 11, 2020

ShomyLiu commented Jul 12, 2020 •

edited

Loading

rohitgr7 commented Jul 12, 2020 •

edited

Loading

ShomyLiu commented Jul 12, 2020

awaelchli commented Jul 12, 2020

ShomyLiu commented Jul 13, 2020

nsarang commented Jul 22, 2020

awaelchli commented Jul 22, 2020

yipliu commented Sep 6, 2022

awaelchli commented Sep 6, 2022

About "Init tensors using type_as" #2585

About "Init tensors using type_as" #2585

Comments

ShomyLiu commented Jul 11, 2020 • edited Loading

github-actions bot commented Jul 11, 2020

rohitgr7 commented Jul 11, 2020

awaelchli commented Jul 11, 2020

ShomyLiu commented Jul 12, 2020 • edited Loading

rohitgr7 commented Jul 12, 2020 • edited Loading

ShomyLiu commented Jul 12, 2020

awaelchli commented Jul 12, 2020

ShomyLiu commented Jul 13, 2020

nsarang commented Jul 22, 2020

awaelchli commented Jul 22, 2020

yipliu commented Sep 6, 2022

awaelchli commented Sep 6, 2022

ShomyLiu commented Jul 11, 2020 •

edited

Loading

ShomyLiu commented Jul 12, 2020 •

edited

Loading

rohitgr7 commented Jul 12, 2020 •

edited

Loading