Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add matmul #49

Merged
merged 2 commits into from
Apr 2, 2020
Merged

Add matmul #49

merged 2 commits into from
Apr 2, 2020

Conversation

huningxin
Copy link
Contributor

@huningxin huningxin commented Mar 11, 2020

This PR is based on matmul signature proposal and compatibility table.

@wchao1115 @nsthorat @BenjaminPoulain and @anssiko, please take a look. Thanks!


Preview | Diff

@wchao1115 wchao1115 self-requested a review March 19, 2020 04:56
@wchao1115
Copy link
Collaborator

@huningxin numpy behavior is that if only one of the two operands is 1D, the promotion of the 1D tensor to 2D takes place. But if both are 1D, the operation is a dot product, returning a scalar tensor.

  1. It might be more clear to add the case when both operands are 1D.
  2. As for the scalar output, can Operand be a scalar value? What's the convention?

@huningxin
Copy link
Contributor Author

huningxin commented Mar 19, 2020

@wchao1115 , thanks for the review.

  1. It might be more clear to add the case when both operands are 1D.

This case can be covered by

  • If a is 1-D, it is converted to a 2-D tensor by prepending a 1 to its dimensions.
  • If b is 1-D, it is converted to a 2-D tensor by by appending a 1 to its dimensions.

But it is still an good idea to be more clearer. I'll add this case.

  1. As for the scalar output, can Operand be a scalar value? What's the convention?

Operand can be scalar value. In that case, its corresponding OperandDescriptor has scalar type, e.g. float32.

For example, to create a scalar float32 constant with value 0:

const tensor = nn.constant({type: 'float32'}, new Float32Array([0.0]));

@gramalingam
Copy link
Contributor

Note that numpy does not allow inputs to be scalar. https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

@huningxin
Copy link
Contributor Author

huningxin commented Mar 19, 2020

Thanks @gramalingam for comment.

Note that numpy does not allow inputs to be scalar. https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html

Yes. It follows numpy's behavior. Do you think we need to mention it specifically?

@gramalingam
Copy link
Contributor

@huningxin , that's good. I was just confused by the previous two comments that "Operand can be a scalar" meant the input could be a scalar. It may be worth being explicit in the documentation, thanks.

index.bs Outdated
Comment on lines 307 to 319
**Returns:** an {{Operand}}. The output N-D tensor that contains the matrix
product of two input tensors.

Computes the matrix product of two input tensors. It behaves as following:
- If both *a* and *b* are 2-D, they are multiplied like conventional
matrices.
- If either *a* or *b* is N-D, N > 2, it is treated as a stack of
matrices (rank-3) with dimensions corresponding to the last two
indices. The matrix multiplication will be broadcasted accordingly.
- If *a* is 1-D, it is converted to a 2-D tensor by prepending a 1 to
its dimensions.
- If *b* is 1-D, it is converted to a 2-D tensor by by appending a 1 to
its dimensions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider this wording instead. Hopefully its more spelled out how the API would behave.

**Returns:** an {{Operand}}. The output N-D tensor that contains the matrix
product of two input tensors.

Computes the matrix product of two input tensors. It behaves as following:
    - If both *a* and *b* are 2-D, they are multiplied like conventional
        matrices and produce a 2-D tensor as the output.
    - If either *a* or *b* is N-D, N > 2, it is treated as a stack of matrices
        (rank-3) with dimensions corresponding to the last two indices. The
        other input broadcasts its dimensions accordingly to allow for matrix 
        multiplications of the last two dimensions of both inputs. The output
        is an N-D tensor of the same shape as the N-D input.
    - If only *a* is 1-D, it is converted to a 2-D tensor by prepending a 1 to
        its dimensions.
    - If only *b* is 1-D, it is converted to a 2-D tensor by by appending a 1 to
        its dimensions.
    - If both *a* and *b* are 1-D, the operation is a vector dot-product, which
        produces a scalar output.

Copy link
Contributor

@gramalingam gramalingam Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "rank-3" mean above? Shouldn't it be (rank-2)? How about saying

... it is treated as a stack of matrices of rank (N-2) with matrix dimensions
corresponding to the last two dimensions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "rank-3" mean above? Shouldn't it be (rank-2)? How about saying

... it is treated as a stack of matrices of rank (N-2) with matrix dimensions
corresponding to the last two dimensions

That was the original wording from @huningxin. I believe it refers to the stacking part of the sentence and not the matrix part of the sentence i.e. the stacking is considered a third rank. But I agree it's confusing. I like your suggestion but also think the sentence is already clear without it. How about we just remove the '(rank-3)' part?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is the first use of broadcasting, we should also include some general description of broadcasting. The description above is a little imprecise if the broadcasting is bidirectional ... I assume it is bidirectional?

Copy link
Contributor

@gramalingam gramalingam Mar 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about replacing the line

The output is an N-D tensor of the same shape as the N-D input.

by

If the two tensors have dimensions [a1, ..., ap, M, N] and [b1, ..., bq, N, K],
then the output is a tensor of dimensions [c1, ..., cr, M, K] where [c1, ..., cr]
is the result of broadcasting [a1, ... ap] and [b1, ..., bq].

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the (rank-3) part is also fine.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we decided to follow the numpy broadcasting rule, I think it's a good idea to add a link referencing the numpy rule here in the API spec. ONNX broadcasting rule is compatible with the numpy rule but further classifies the type of broadcasting to two categories -- either uni or multi-directional. In this case since it only concerns 2 operands and that either one of them can be N-D, this distinction, I believe, is not relevant here i.e. the broadcasting behavior of the N-D operand simply follows the numpy rule as previously stated.

Copy link
Contributor Author

@huningxin huningxin Mar 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wchao1115 @gramalingam thanks for your comments.

What does "rank-3" mean above?

'(rank-3)' refers to the stack. I agree it is confusing. I update the PR to remove it.

Since we decided to follow the numpy broadcasting rule, I think it's a good idea to add a link referencing the numpy rule here in the API spec.

It sounds good to me. I will add that link as a normative reference into the PR. How about saying

The matrix multiplication will be broadcasted accordingly by
following [numpy-broadcasting-rule]. 

The output is an N-D tensor of the same shape as the N-D input.

This may not be true. For example, if a has shape [2, 3, 4, 5] and b has shape [5, 4], the resulting tensor will have a shape of [2, 3, 4, 4].

As we refer to numpy broadcasting rule, how about we say

The output is a N-D tensor whose rank is the maximum rank of the input tensors. 
For each dimension of the output tensor, its size is the maximum size along that 
dimension of the input tensors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the output dimension: yes, we need to change "The output is an N-D tensor of the same shape as the N-D input." @huningxin 's proposal looks good, except that "the maximum size along that dimension of the input tensors" does not apply to the last two dimension, which are subject to the matrix-multiplication rules. (Which is why I suggested a slightly different wording above; but we can adapt Ningxin's words also accordingly.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gramalingam, your suggestion sounds good. I'll incorporate that. Thanks.

@huningxin
Copy link
Contributor Author

@wchao1115 @gramalingam , I pushed a new commit 2fd2baf to address your comments. Please take another look. Thanks!

@huningxin
Copy link
Contributor Author

@wchao1115 @gramalingam , thanks for the review and approval.

@anssiko
Copy link
Member

anssiko commented Apr 2, 2020

This PR has received adequate review and all the feedback has been addressed.

@huningxin @wchao1115 feel free to merge this PR.

@huningxin huningxin merged commit b341ec2 into webmachinelearning:master Apr 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants