-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inconsistency in the notation / term epoch #89
Comments
Yes, you are right that notation is inconsistent here, since the parameter "epochs" is used to count training data splits which is not intuitive. So until we fix this you are correct: use 100 * number_of_train_files as max_epochs if you want to do 100 epochs. Generally, our advice is to set the max epochs to an extremely high number and run the training until the learning rate has annealed twice. The learning rate starts annealing when training yields few improvements, so when it has annealed a few times the model is as good as it can get. Also, we would recommend grouping the training files so that you have about 20-50 files so that you do not lose too much time validating at the end of each split, and set patience to perhaps half the number of your training splits! |
Thanks @alanakbik for the clarification. At the moment I am seeing the loss not decreasing so quick, and ppl also got stuck somehow .. |
Hi, yes it looks like the learning rate has annealed too quickly. The learning rate is 0.00 in the output. This happens because your training splits are too small giving the learning too many opportunities to anneal. Either increase the size of your training splits or increase the patience. Or even better: both :) Try: trainer.train('resources/taggers/language_model_es_forward',
sequence_length=250,
mini_batch_size=100,
max_epochs=2000,
patience=100) |
@alanakbik Yes I have tried it with the larger data size, (also increased the hidden neuron numbers, just in case..) and it already seems better! Thanks @alanakbik for the suggestion, I will try with patience = 100! BTW, does the language model require specific input ? I have used each one sentence input. In addition to that do I need to separate each token or do any normalization? |
Term/epoch notation fixed in release-0.3. |
Hello,
I am training language model,
Before testing with whole text, I was running it with smaller text.
From my understanding epoch vs. batch vs. minibatch is: (From a post in stackexchange)

However when train the language model with following parameters,
I get following output.
Currently there are 51 input files. ( I see that the mini_batch_size is larger than the number of the training files. So this might be a problem)
However, in the output, the epoch number doesn't change .. only it says end of split (1/ 51) and it finishes after (5 / 51) ..
I wonder if it is due to different use of terminology?
Or in this case it only goes through 5 files and stops ?
If I want to go through my entire dataset 100 times for example, I have to do 100 * number_of_train_files as max_epochs?
(The Screenshot might be confusing, but the training finished after 5 / 51.. )

The original dataset that I have (wikidump es) are currently in 2300 files (each file about 1MB).
I am intending to put about 5 files together and make it into validation set, and put about 5 files together and make it into test set.
The rest of the files (about 2290 files), I will use to train the model.
If I want to pass multiple times what value should I use for epochs?
What are the good number of passes of data, when you trained your language model for English and German models?
The text was updated successfully, but these errors were encountered: