-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add matmul #49
Add matmul #49
Conversation
@huningxin numpy behavior is that if only one of the two operands is 1D, the promotion of the 1D tensor to 2D takes place. But if both are 1D, the operation is a dot product, returning a scalar tensor.
|
@wchao1115 , thanks for the review.
This case can be covered by
But it is still an good idea to be more clearer. I'll add this case.
Operand can be scalar value. In that case, its corresponding OperandDescriptor has scalar type, e.g. For example, to create a scalar float32 constant with value 0:
|
Note that numpy does not allow inputs to be scalar. https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.matmul.html |
Thanks @gramalingam for comment.
Yes. It follows numpy's behavior. Do you think we need to mention it specifically? |
@huningxin , that's good. I was just confused by the previous two comments that "Operand can be a scalar" meant the input could be a scalar. It may be worth being explicit in the documentation, thanks. |
index.bs
Outdated
**Returns:** an {{Operand}}. The output N-D tensor that contains the matrix | ||
product of two input tensors. | ||
|
||
Computes the matrix product of two input tensors. It behaves as following: | ||
- If both *a* and *b* are 2-D, they are multiplied like conventional | ||
matrices. | ||
- If either *a* or *b* is N-D, N > 2, it is treated as a stack of | ||
matrices (rank-3) with dimensions corresponding to the last two | ||
indices. The matrix multiplication will be broadcasted accordingly. | ||
- If *a* is 1-D, it is converted to a 2-D tensor by prepending a 1 to | ||
its dimensions. | ||
- If *b* is 1-D, it is converted to a 2-D tensor by by appending a 1 to | ||
its dimensions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider this wording instead. Hopefully its more spelled out how the API would behave.
**Returns:** an {{Operand}}. The output N-D tensor that contains the matrix
product of two input tensors.
Computes the matrix product of two input tensors. It behaves as following:
- If both *a* and *b* are 2-D, they are multiplied like conventional
matrices and produce a 2-D tensor as the output.
- If either *a* or *b* is N-D, N > 2, it is treated as a stack of matrices
(rank-3) with dimensions corresponding to the last two indices. The
other input broadcasts its dimensions accordingly to allow for matrix
multiplications of the last two dimensions of both inputs. The output
is an N-D tensor of the same shape as the N-D input.
- If only *a* is 1-D, it is converted to a 2-D tensor by prepending a 1 to
its dimensions.
- If only *b* is 1-D, it is converted to a 2-D tensor by by appending a 1 to
its dimensions.
- If both *a* and *b* are 1-D, the operation is a vector dot-product, which
produces a scalar output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "rank-3" mean above? Shouldn't it be (rank-2)? How about saying
... it is treated as a stack of matrices of rank (N-2) with matrix dimensions
corresponding to the last two dimensions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "rank-3" mean above? Shouldn't it be (rank-2)? How about saying
... it is treated as a stack of matrices of rank (N-2) with matrix dimensions corresponding to the last two dimensions
That was the original wording from @huningxin. I believe it refers to the stacking part of the sentence and not the matrix part of the sentence i.e. the stacking is considered a third rank. But I agree it's confusing. I like your suggestion but also think the sentence is already clear without it. How about we just remove the '(rank-3)' part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the first use of broadcasting, we should also include some general description of broadcasting. The description above is a little imprecise if the broadcasting is bidirectional ... I assume it is bidirectional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about replacing the line
The output is an N-D tensor of the same shape as the N-D input.
by
If the two tensors have dimensions [a1, ..., ap, M, N] and [b1, ..., bq, N, K],
then the output is a tensor of dimensions [c1, ..., cr, M, K] where [c1, ..., cr]
is the result of broadcasting [a1, ... ap] and [b1, ..., bq].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing the (rank-3) part is also fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we decided to follow the numpy broadcasting rule, I think it's a good idea to add a link referencing the numpy rule here in the API spec. ONNX broadcasting rule is compatible with the numpy rule but further classifies the type of broadcasting to two categories -- either uni or multi-directional. In this case since it only concerns 2 operands and that either one of them can be N-D, this distinction, I believe, is not relevant here i.e. the broadcasting behavior of the N-D operand simply follows the numpy rule as previously stated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wchao1115 @gramalingam thanks for your comments.
What does "rank-3" mean above?
'(rank-3)' refers to the stack. I agree it is confusing. I update the PR to remove it.
Since we decided to follow the numpy broadcasting rule, I think it's a good idea to add a link referencing the numpy rule here in the API spec.
It sounds good to me. I will add that link as a normative reference into the PR. How about saying
The matrix multiplication will be broadcasted accordingly by
following [numpy-broadcasting-rule].
The output is an N-D tensor of the same shape as the N-D input.
This may not be true. For example, if a has shape [2, 3, 4, 5] and b has shape [5, 4], the resulting tensor will have a shape of [2, 3, 4, 4].
As we refer to numpy broadcasting rule, how about we say
The output is a N-D tensor whose rank is the maximum rank of the input tensors.
For each dimension of the output tensor, its size is the maximum size along that
dimension of the input tensors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the output dimension: yes, we need to change "The output is an N-D tensor of the same shape as the N-D input." @huningxin 's proposal looks good, except that "the maximum size along that dimension of the input tensors" does not apply to the last two dimension, which are subject to the matrix-multiplication rules. (Which is why I suggested a slightly different wording above; but we can adapt Ningxin's words also accordingly.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gramalingam, your suggestion sounds good. I'll incorporate that. Thanks.
@wchao1115 @gramalingam , I pushed a new commit 2fd2baf to address your comments. Please take another look. Thanks! |
@wchao1115 @gramalingam , thanks for the review and approval. |
This PR has received adequate review and all the feedback has been addressed. @huningxin @wchao1115 feel free to merge this PR. |
This PR is based on matmul signature proposal and compatibility table.
@wchao1115 @nsthorat @BenjaminPoulain and @anssiko, please take a look. Thanks!
Preview | Diff