ORT 1.19.2 Release: Cherry Pick Round 1 #21861

prathikr · 2024-08-26T18:07:21Z

Approved cherry picks for ORT 1.19.2 release.

### Description Fixed #21775 ### Motivation and Context The dlls should be signed with Keycode CP-230012. The default is the test code sign.

### Description  ### Motivation and Context The training wheel size limit should be 400M

… access. (#21797)

### Description  ### Motivation and Context  --------- Co-authored-by: Your Name <[email protected]>

@wangyems

Softmax (formula 1) is like the following: ```math y_{i} = \frac{exp(x_{i})}{\sum_{i} exp(x_{i})} ``` After applying softmax, each element will be in the range of $(0, 1)$, and the elements will add up to 1, so that they can be interpreted as probabilities. However, in language model, softmax has two issues: * When all elements are -inf (for example, a whole row is masked when a query token is padding), the result is not defined since exp(-inf)=0 and divided-by-zero is encountered in the above formula. * Why do we need normalize in a way that each query word are treated as equal important (each row has sum equals to1)? **Smooth Softmax** (formula 2) is a modified version that introduces a smooth factor like the following: ```math s_{i} = \frac{exp(x_{i})}{1+ \sum_{i} exp(x_{i})} ``` This formula could tackle the above two issues: * It could handle the special case that all elements are -inf: the result $s_{i}$ is 0 for every element in such case. * Sum of all elements $\sum_{i}{s_{i}} = \frac{\sum_{i}{exp(x_{i})}}{1+ \sum_{i} exp(x_{i})}$ is in the range of (0, 1), so that we can train the model to assign different importance to different query words. Since exponential is prone to overflow or underflow, to get stable result, formula 3 can be used: ```math s_{i} = \frac{exp(x_{i} + c)}{exp(c)+ \sum_{i} exp(x_{i} +c)} ``` c can be any value in theory. In practical, choice of constant c shall avoid $exp(c)$ and $exp(x_{i} +c)$ overflow (or underflow) at the same time. A reasonable choice is like formula 4: ```math c=-\max_{i} \{ x_i \} ``` or apply a constraint that c <=0 like the following formula 5: ```math c=-\max(0, \max_{i} \{ x_i \}) ``` The latter one (formula 5) ensures that $s_{i}$ will fallback to formula 2 when all elements are negative. For CPU provider, smooth softmax is implemented in MLAS. CPU implementation uses formula 5. @wangyems implemented the smooth softmax in flash attention for CUDA, which requires Ampere or newer GPU. The implementation of smooth softmax in flash attention uses formula 4. --------- Co-authored-by: Ye Wang

refer to #21867  --------- Co-authored-by: Your Name <[email protected]>

Enable causal in MultiHeadAttention cuda operator. All formats (Q_K_V_BSNH_BSNH_BSNH, Q_K_V_BSNH_BNSH_BNSH, Q_KV_BSNH_BSN2H and QKV_BSN3H) supports causal for now. Internally, casual will be dispatch to flash attention, efficient attention or unfused attention kernel. Currently, MultiHeadAttention has causal enabled in CPU ep, but not in CUDA ep. It could cause issues in onnx conversion, like some model can run in CPU but not in CUDA. Enable causal in CUDA will reduce the difference of support matrix of CPU/CUDA.

### Description Found a bug with num splits where the heuristic isn't being performed properly due to incorrect passing of sequence length to heuristic function. ### Motivation and Context We were experiencing significant performance issues with long sequence length with flash attention due to this misconfiguration.

…line (#21789) ### Description Upgrade pytorch_lightning to fix orttraining_amd_gpu_ci_pipeline ``` #24 1.838 WARNING: Ignoring version 1.6.0 of pytorch_lightning since it has invalid metadata: #24 1.838 Requested pytorch_lightning==1.6.0 from https://files.pythonhosted.org/packages/09/18/cee67f4849dea9a29b7af7cdf582246bcba9eaa73d9443e138a4172ec786/pytorch_lightning-1.6.0-py3-none-any.whl has invalid metadata: .* suffix can only be used with `==` or `!=` operators #24 1.838 torch (>=1.8.*) #24 1.838 ~~~~~~^ #24 1.838 Please use pip<24.1 if you need to use this version. #24 1.838 ERROR: Ignored the following versions that require a different python version: 1.14.0 Requires-Python >=3.10; 1.14.0rc1 Requires-Python >=3.10; 1.14.0rc2 Requires-Python >=3.10; 2.1.0 Requires-Python >=3.10; 2.1.0rc1 Requires-Python >=3.10 #24 1.838 ERROR: Could not find a version that satisfies the requirement pytorch_lightning==1.6.0 (from versions: 0.0.2, 0.2, 0.2.2, 0.2.3, 0.2.4, 0.2.4.1, 0.2.5, 0.2.5.1, 0.2.5.2, 0.2.6, 0.3, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.4.1, 0.3.5, 0.3.6, 0.3.6.1, 0.3.6.3, 0.3.6.4, 0.3.6.5, 0.3.6.6, 0.3.6.7, 0.3.6.8, 0.3.6.9, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.5, 0.4.6, 0.4.7, 0.4.8, 0.4.9, 0.5.0, 0.5.1, 0.5.1.2, 0.5.1.3, 0.5.2, 0.5.2.1, 0.5.3, 0.5.3.1, 0.5.3.2, 0.5.3.3, 0.6.0, 0.7.1, 0.7.3, 0.7.5, 0.7.6, 0.8.1, 0.8.3, 0.8.4, 0.8.5, 0.9.0, 0.10.0, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.0.6, 1.0.7, 1.0.8, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.1.6, 1.1.7, 1.1.8, 1.2.0rc0, 1.2.0rc1, 1.2.0rc2, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.2.6, 1.2.7, 1.2.8, 1.2.9, 1.2.10, 1.3.0rc1, 1.3.0rc2, 1.3.0rc3, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.3.7.post0, 1.3.8, 1.4.0rc0, 1.4.0rc1, 1.4.0rc2, 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 1.4.7, 1.4.8, 1.4.9, 1.5.0rc0, 1.5.0rc1, 1.5.0, 1.5.1, 1.5.2, 1.5.3, 1.5.4, 1.5.5, 1.5.6, 1.5.7, 1.5.8, 1.5.9, 1.5.10, 1.6.0rc0, 1.6.0rc1, 1.6.0, 1.6.1, 1.6.2, 1.6.3, 1.6.4, 1.6.5, 1.7.0rc0, 1.7.0rc1, 1.7.0, 1.7.1, 1.7.2, 1.7.3, 1.7.4, 1.7.5, 1.7.6, 1.7.7, 1.8.0rc0, 1.8.0rc1, 1.8.0rc2, 1.8.0, 1.8.0.post1, 1.8.1, 1.8.2, 1.8.3, 1.8.3.post0, 1.8.3.post1, 1.8.3.post2, 1.8.4, 1.8.4.post0, 1.8.5, 1.8.5.post0, 1.8.6, 1.9.0rc0, 1.9.0, 1.9.1, 1.9.2, 1.9.3, 1.9.4, 1.9.5, 2.0.0rc0, 2.0.0, 2.0.1, 2.0.1.post0, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6, 2.0.7, 2.0.8, 2.0.9, 2.0.9.post0, 2.1.0rc0, 2.1.0rc1, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.1.4, 2.2.0rc0, 2.2.0, 2.2.0.post0, 2.2.1, 2.2.2, 2.2.3, 2.2.4, 2.2.5, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0) #24 1.838 ERROR: No matching distribution found for pytorch_lightning==1.6.0 ```

### Description Fix `Orttraining Linux Lazy Tensor CI Pipeline` - Remove unused import of `torch.onnx._internal.exporter`, whose path is changed in newer torch (pytorch/pytorch#132429). - Move import of `register_custom_op_symbolic` from `torch.onnx` into local function, which causes circular import when running `import torch.onnx` (at least in the CI environment).

### Description This change disables Abseil's symbolize functionality in Windows non-debug builds. ### Motivation and Context To solve #21826. Avoid having a dependency on dbghelp.dll.

MaanavD

Lgtm! Let’s have the pipelines decide 🙏

sign native dll with correct cert (#21854)

7ca73dc

### Description Fixed #21775 ### Motivation and Context The dlls should be signed with Keycode CP-230012. The default is the test code sign.

prathikr requested a review from a team as a code owner August 26, 2024 18:07

prathikr and others added 5 commits August 26, 2024 11:12

undo training CI fixes

39b11ce

update size limit check of training GPU wheel (#21762)

b11ea20

### Description  ### Motivation and Context The training wheel size limit should be 400M

[CoreML EP] Fix ArgMaxOpBuilder::AddToModelBuilderImpl() nullptr Node…

ac5df31

… access. (#21797)

1.19.1 -> 1.19.2

e2b70e6

prathikr requested a review from a team as a code owner August 27, 2024 21:14

tianleiwu and others added 2 commits August 29, 2024 11:05

Support Smooth Softmax in fmha (#21885)

37f896d

 refer to #21867  --------- Co-authored-by: Your Name <[email protected]>

tianleiwu force-pushed the prathikrao/cherry-pick-r1-1.19.2 branch from 18002b5 to 37f896d Compare August 29, 2024 18:07

tianleiwu and others added 5 commits August 29, 2024 14:51

Disable absl symbolize in Windows Release build (#21923)

e2224c0

### Description This change disables Abseil's symbolize functionality in Windows non-debug builds. ### Motivation and Context To solve #21826. Avoid having a dependency on dbghelp.dll.

baijumeswani approved these changes Aug 30, 2024

View reviewed changes

MaanavD approved these changes Aug 30, 2024

View reviewed changes

tianleiwu approved these changes Aug 30, 2024

View reviewed changes

edgchen1 approved these changes Aug 30, 2024

View reviewed changes

prathikr merged commit ffceed9 into rel-1.19.2 Aug 30, 2024
111 of 114 checks passed

prathikr deleted the prathikrao/cherry-pick-r1-1.19.2 branch August 30, 2024 22:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORT 1.19.2 Release: Cherry Pick Round 1 #21861

ORT 1.19.2 Release: Cherry Pick Round 1 #21861

prathikr commented Aug 26, 2024

MaanavD left a comment

ORT 1.19.2 Release: Cherry Pick Round 1 #21861

ORT 1.19.2 Release: Cherry Pick Round 1 #21861

Conversation

prathikr commented Aug 26, 2024

MaanavD left a comment

Choose a reason for hiding this comment