Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About "Init tensors using type_as" #2585

Closed
ShomyLiu opened this issue Jul 11, 2020 · 12 comments
Closed

About "Init tensors using type_as" #2585

ShomyLiu opened this issue Jul 11, 2020 · 12 comments

Comments

@ShomyLiu
Copy link
Contributor

ShomyLiu commented Jul 11, 2020

As described in the doc: https://pytorch-lightning.readthedocs.io/en/stable/multi_gpu.html#init-tensors-using-type-as

When you need to create a new tensor, use type_as. This will make your code scale to any arbitrary number of GPUs or TPUs with Lightning.

# with lightning
def forward(self, x):
    z = torch.Tensor(2, 3)
    z = z.type_as(x, device=self.device)

However, it would be not convenient when there is no proper x , for example:

class Model(pl.LightningModule):
    def __init__(self, opt):
        super().__init__()
        self.opt = opt
        self.net = Net(opt)

    def forward(self, x):
        return self.net(x)
    def training_step(self, batch, batch_id):
        x, labels = batch
        ### labels: a tuple (0,0,0,....1,0)
        ### to(self.device) is right
        labels = torch.LongTensor(labels).to(self.device)  
        output = self(x)
        loss = F.cross_entropy(output, labels)
        return {"loss": loss}

class Net(nn.Module):
    def __init__(self, opt):

        super().__init__()
        self.opt = opt
        self.init_bert()
    def forward(self, x):
        x = self.get_bert(x)
        return x
    def init_bert(self):
        self.bert_tokenizer = AutoTokenizer.from_pretrained(self.opt.bert_path)
        self.bert = AutoModel.from_pretrained(self.opt.bert_path)
    def get_bert(self, sentence_lists):
        sentence_lists = [' '.join(x) for x in sentence_lists]
        ids = self.bert_tokenizer(sentence_lists, padding=True, return_tensors="pt")
        ###
        #  how to put the ids['input_ids'] into the right device ???
        #  .to(self.bert.device) is right
        #  .to(self.device) is wrong
        ###
        inputs = ids['input_ids'].to(self.bert.device)
        print("**************")
        print(self.device)    
        print(self.bert.device)
        print("**************")
        embeddings = self.bert(inputs)
        return embeddings[0]

The problems have been marked in the code.
My confusion is that when I run the code using multi-gpu:
(1) why "labels = torch.LongTensor(labels).to(self.device) " is right while "inputs = ids['input_ids'].to(self.device)" is wrong.
In other words, in the multi-gpu running, the self.device is not consitent with self.bert.device.

(2) For the above case, are there any other ways to send the ids['input_ids'] into the right device in the multi-gpus running, rather than inputs = ids['input_ids'].to(self.bert.device) since the doc of lightning requires Delete .cuda() or .to() calls

@github-actions
Copy link
Contributor

Hi! thanks for your contribution!, great first issue!

@rohitgr7
Copy link
Contributor

It will work since self(x) is called withing training_step and then self.forward is called within the same function and then get_bert will be called, so all of these functions will be called on the same device within the same process. You can simply use self.device to transfer tensors to the device, it won't break. Just a suggestion, you should tokenize these within a Dataset __getitem__ method rather than in forward and return the tokenized outputs to be used in forward directly from the dataset.

Just curious, does self.bert.device gives you anything? AFAIK PyTorch model doesn't have a device attribute.

@awaelchli
Copy link
Contributor

In the first part of the code, if labels is a tensor, it should already be on the right device, since it is passed in to the training_step.
I am also curious why what self.bert.device gives you and why you claim in the comment # .to(self.device) is wrong

@ShomyLiu
Copy link
Contributor Author

ShomyLiu commented Jul 12, 2020

Thanks for the promote replies and suggestions. This is a toy experiment to try lightning, and it indeed a nice framework~

First, the bert of AutoModel in hugginggface transformers has a property function device which could get torch.device from module. (Ref: https://huggingface.co/transformers/_modules/transformers/modeling_utils.html#PreTrainedModel). The device in self.bert should be always the right device that can reflect which device the model is in.
@awaelchli @rohitgr7

The code could run successfully after adding the self.device into the self.net(x, self.device).
I might find the reason: firstly I registered the device in the __init__ while the device by lightning is setting in the training_step.

So in summary, tensors.to(self.device) is one of the correct approaches to put tensors into devices in lightning besides using type_as.

@rohitgr7
Copy link
Contributor

rohitgr7 commented Jul 12, 2020

type_as has nothing to do with the device. It's used just to map tensors to a dtype. If you just pass a device too then it will do .to(self.device) for you so that you don't have to do it again explicitly yourself.

@ShomyLiu
Copy link
Contributor Author

Yeah, thanks very much.
My intention is when there are no appropriate variables to conduct type_as shown in the documents, it is necessary to pass the device to(self.device) explicitly to put a new tensor into the right device.

@awaelchli
Copy link
Contributor

Yes, that's correct. If it is the only way, then you need to do that.
The documentation is an advice for best practice and not a fixed rule you need to follow to use Lightning. So what you are doing is fine.
Does this clarification resolve the issue?

@ShomyLiu
Copy link
Contributor Author

Yes! Thanks for your suggestions.

@nsarang
Copy link
Contributor

nsarang commented Jul 22, 2020

@awaelchli
It seems that PyTorch is going to deprecate type_as for changing devices. Have a look at:
type_as() method change device too #33662.
They've already replaced it in the codes:
remove uses of type() and type_as() part 1. #38029

Perhaps we shouldn't encourage its use in the documentation.

@awaelchli
Copy link
Contributor

@nsarang Didn't know. Are you interested in making these doc updates?

@yipliu
Copy link

yipliu commented Sep 6, 2022

I also think there is a need to give a way to fit PL, because type_as is really not working well

@awaelchli
Copy link
Contributor

@yipliu I opened #14554 to check if we can make the documentation better. As mentioned before, contributions there are also welcomed 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants