Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix mxnet ctc_loss bug #11834

Merged
merged 3 commits into from
Jul 27, 2018
Merged

Fix mxnet ctc_loss bug #11834

merged 3 commits into from
Jul 27, 2018

Conversation

HawkAaron
Copy link
Contributor

@HawkAaron HawkAaron commented Jul 20, 2018

  1. fix ctc_loss GPU bug
    2. add blank label for gluon CTCLoss
    edit @szha: crossed out second item

@@ -409,6 +409,8 @@ class CTCLoss(Loss):
length respectively.
weight : float or None
Global scalar weight for loss.
blank_label : {'first', 'last'}, default 'last'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was intentional not to expose this option in gluon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means I need to revert this commit and resend a pull request ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can simply add a commit that removes the change in blank_label

@chinakook
Copy link
Contributor

Is the speed of big vocab size solved of this op?

@HawkAaron
Copy link
Contributor Author

@chinakook its looks good to me

@szha
Copy link
Member

szha commented Jul 25, 2018

@Jerryzcn could you give this a try?

@Jerryzcn
Copy link
Contributor

will do

@Jerryzcn
Copy link
Contributor

Jerryzcn commented Jul 26, 2018

Thanks @HawkAaron.

it seems removing the max subtraction negatively affect convergence. Do you observe similar result?

In the original baidu ctc, there is a section that do max subtraction. I suspect the broadcast is too slow here. Maybe we should write a function for this.

for(int r = 0; r < alphabet_size_; ++r) {
    probs[r + col_offset] = std::exp(activations[r + col_offset] - max_activation);
    denom += probs[r + col_offset];
}

Copy link
Contributor

@Jerryzcn Jerryzcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subtracting max is missing

denoms_handle = reduce_with_axis<red::sum, false>(
F<mxnet::op::mshadow_op::exp>(
log_probs_handle -
broadcast<0>(reduce_with_axis<red::maximum, false>(log_probs_handle, 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max is necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@Jerryzcn Jerryzcn Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Did not look at other parts of the code thanks! Found a bug on my end.

@szha szha merged commit 2bddf6f into apache:master Jul 27, 2018
@szha
Copy link
Member

szha commented Jul 27, 2018

@HawkAaron thanks for the fix, and @Jerryzcn thanks for the review

XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018
* fix ctc_loss GPU bug

* add blank_label parameter for CTCLoss

* Revert "add blank_label parameter for CTCLoss"

This reverts commit aab11f7.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants