Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RTX2080 GPU is only being used at 30% - whilst a single CPU core is maxed out #1080

Closed
tombburnell opened this issue Sep 9, 2019 · 9 comments
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@tombburnell
Copy link

I am trying to train a model and I find that the GPU (RTX 2080) is only being used approx 30% whilst 1 of my CPU cores (9900k@5Ghz) appears to be maxed out.

I'm trying to train a corpus of roughly 100k records tagged with approx 100 keywords.

  • ubuntu18
  • cuda10
  • python 3.7.3
  • 'en' embeddings
  • embeddings_storage_mode=gpu

Is it expected that the GPU is not fully utilised?

Can it not use more than 1 cpu core?

What other info should I provide?

@tombburnell tombburnell added the bug Something isn't working label Sep 9, 2019
@pommedeterresautee
Copy link
Contributor

What task are you trying to perform?
LM training should be around 100%.
For NER training, there are few slow operations to replace to improve perf (some memory transfer that may be reduced, some operations that can be vectorized, etc.).
Not yet worked on classification.

A work is on going:
#1074
#1068
#1053
#1038

Few others are coming.
When all are applied my own 2080TI is going up to 50%.
May be more optimizations would help to reduce inference time, please don't hesitate to help!

@tombburnell
Copy link
Author

tombburnell commented Sep 10, 2019

Hi,
Sorry, I'm very new to this so not familiar with the terminology.
I'm trying to create a model that predicts multiple relevant topics given a text document.

NER is named entity recognition? I'm not doing that. What is LM?

Once trained, I run a script ./predict "Some text" and get back list of labels and their score.

I have created my corpus in FastText Format. I have eliminated labels that occur infrequently.
I notice that more labels seems to make it exponentially slower.

My training code is..

if( torch.cuda.is_available():
  flair.device = torch.device('cuda:0');
else:
  flair.device = torch.device('cpu');

FASTTEXT_EMBEDDINGS = 'en'
# this is the folder in which train, test and dev files reside
data_folder = 'corpus'
# load corpus containing training, test and dev data
corpus: Corpus = ClassificationCorpus(data_folder,
                                      test_file='test.txt',
                                      dev_file='dev.txt',
                                      train_file='train.txt')
# 2. create the label dictionary
label_dict = corpus.make_label_dictionary()
# 3. make a list of word embeddings
word_embeddings = [
                   #WordEmbeddings('glove'),
                   WordEmbeddings('en')
                   # comment in flair embeddings for state-of-the-art results
                   # FlairEmbeddings('news-forward'),
                   # FlairEmbeddings('news-backward'),
                   ]
# 4. initialize document embedding by passing list of word embeddings
# Can choose between many RNN types (GRU by default, to change use rnn_type parameter)
document_embeddings: DocumentRNNEmbeddings = DocumentRNNEmbeddings(word_embeddings,
                                                                     hidden_size=512,
                                                                     reproject_words=True,
                                                                     reproject_words_dimension=256,
                                                                     )
# 5. create the text classifier
classifier = TextClassifier(document_embeddings, label_dictionary=label_dict)

# 6. initialize the text classifier trainer
trainer = ModelTrainer(classifier, corpus)
trainer.train('model',
              embeddings_storage_mode = 'gpu',
              learning_rate=0.1,
              mini_batch_size=32,
              anneal_factor=0.5,
              patience=5,
              max_epochs=200)

@pommedeterresautee
Copy link
Contributor

LM -> Language model.
You are right about NER, and you are trying to perform classification.

I have not yet worked on classification optimization, however what I can tell you is that to best way to reduce the inference time is through increasing GPU use by spotting and removing as much as possible slow operations. Parallelization may be an answer for some parts of the process but so far I had better results by vectorizing and avoiding memory copy.

If you are curious you can check what slow down using a profiler like cProfile or the pytorch profiler (easy to use, and produce an easy to exploit report)

@tombburnell
Copy link
Author

Thanks, I'll take a look at that.

Also, for this kind of classification task, should I be using Word Embeddings or Document Embeddings?

@alanakbik
Copy link
Collaborator

Your code looks good, i.e. use document embeddings that use a stack of word embeddings.

@tombburnell
Copy link
Author

I'm running a training model right now using the latest code and it only appears to be using 3-10% of my GPU.

2019-10-02 10:48:29,136 ----------------------------------------------------------------------------------------------------
2019-10-02 10:48:29,137 Model: "TextClassifier(
(document_embeddings): DocumentRNNEmbeddings(
(embeddings): StackedEmbeddings(
(list_embedding_0): WordEmbeddings('glove')
(list_embedding_1): WordEmbeddings('en')
)
(word_reprojection_map): Linear(in_features=400, out_features=256, bias=True)
(rnn): GRU(256, 512, batch_first=True)
(dropout): Dropout(p=0.5, inplace=False)
)
(decoder): Linear(in_features=512, out_features=27, bias=True)
(loss_function): BCEWithLogitsLoss()
)"
2019-10-02 10:48:29,137 ----------------------------------------------------------------------------------------------------
2019-10-02 10:48:29,137 Corpus: "Corpus: 188540 train + 23567 dev + 23567 test sentences"
2019-10-02 10:48:29,137 ----------------------------------------------------------------------------------------------------
2019-10-02 10:48:29,137 Parameters:
2019-10-02 10:48:29,137 - learning_rate: "0.1"
2019-10-02 10:48:29,137 - mini_batch_size: "32"
2019-10-02 10:48:29,137 - patience: "5"
2019-10-02 10:48:29,137 - anneal_factor: "0.5"
2019-10-02 10:48:29,137 - max_epochs: "200"
2019-10-02 10:48:29,137 - shuffle: "True"
2019-10-02 10:48:29,137 - train_with_dev: "False"
2019-10-02 10:48:29,137 - batch_growth_annealing: "False"

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2080 Off | 00000000:02:00.0 On | N/A |
| 27% 52C P8 22W / 215W | 2972MiB / 7979MiB | 3% Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1658 G /usr/lib/xorg/Xorg 24MiB |
| 0 1696 G /usr/bin/gnome-shell 58MiB |
| 0 1881 G /usr/lib/xorg/Xorg 226MiB |
| 0 2030 G /usr/bin/gnome-shell 169MiB |
| 0 4705 G ...uest-channel-token=16982847936099033133 75MiB |
| 0 10912 G ...uest-channel-token=17263537349738535565 62MiB |
| 0 13452 C python 1363MiB |
| 0 18113 G ...pycharm-community-2019.2.1/jbr/bin/java 8MiB |
| 0 30650 C python 977MiB |
+-----------------------------------------------------------------------------+

A colleague ran some lower level torch code on the same machine yesterday and was get 99% gpu utilisation

@pommedeterresautee
Copy link
Contributor

I have worked on TextClassifier these last days.
Optimization are not yet pushed because I wait for a fix on sentiment model to be pushed first (#1165). Otherwise I can't check if I don't break model accuracy... Anyway... soon it should work much faster.

@tombburnell
Copy link
Author

ok thanks - looking forward to it.

@stale
Copy link

stale bot commented Apr 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Apr 29, 2020
@stale stale bot closed this as completed May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants