Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc only change to CUTLASS 3.3 changelog #1180

Merged
merged 1 commit into from
Nov 13, 2023

Conversation

manishucsd
Copy link
Contributor

This doc-only PR addresses the discussion at #1170

Summary and rationale for the suggested changes

  1. Mixed-Precision to Mixed-Input. Mixed-Precision is taken by the GEMM data-type where inputs (DataType(operandA) == DataType(operandB) are mixed with a different accumulation data type (F16*F16+F32 and BF16*BF16+F32). The code uses cutlass::arch::OpMultiplyAddMixedInputUpcast tag to navigate and communicate that input data types are mixed. It would be good to set a nomenclature that is consistent and distinguishes between Mixed-Precision and Mixed-Input use-case.

  2. Update the hyperlink for Mixed Precision Ampere GEMMs to the PR#1084 which has detailed description, steps to only compile Ampere mixed-input GEMMs, reproduce performance results, and a performance graph.

@manishucsd manishucsd force-pushed the doc_only_change_changelog_3.3 branch from 2e578f9 to a87fb92 Compare November 9, 2023 00:22
@hwu36
Copy link
Collaborator

hwu36 commented Nov 9, 2023

could you pleae also change CHANGELOG.md?

@manishucsd manishucsd force-pushed the doc_only_change_changelog_3.3 branch from a87fb92 to 56fb032 Compare November 9, 2023 19:37
@manishucsd
Copy link
Contributor Author

could you pleae also change CHANGELOG.md?

done

@hwu36 hwu36 merged commit 5ae8133 into NVIDIA:main Nov 13, 2023
@manishucsd manishucsd deleted the doc_only_change_changelog_3.3 branch June 24, 2024 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants