-
Notifications
You must be signed in to change notification settings - Fork 538
Pull requests: pytorch/FBGEMM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Enable preshuffled mixed dtype Cutlass Gemm
cla signed
fb-exported
#3722
opened Feb 21, 2025 by
jwfromm
Loading…
torch.ops.fbgemm.scatter_add_along_first_dim
..
cla signed
fb-exported
#3720
opened Feb 21, 2025 by
levendlee
Loading…
Implement generate_vbe_metadata cpu
cla signed
fb-exported
#3715
opened Feb 19, 2025 by
spcyppt
Loading…
[fbgemm_gpu] Add benchmark workflows
cla signed
module: rocm
#3713
opened Feb 19, 2025 by
q10
Loading…
update to tune for small
m
s and quantized gemv
cla signed
fb-exported
#3712
opened Feb 19, 2025 by
YUNQIUGUO
Loading…
Unifying TBE API using List (Frontend)
cla signed
fb-exported
#3711
opened Feb 19, 2025 by
spcyppt
Loading…
Add NEON implementation of Fused8BitRowwiseQuantizedSBFloatToFloatOrHalf
cla signed
fb-exported
#3707
opened Feb 18, 2025 by
Nicoshev
Loading…
Add fp_rowwise_gemm configurations that can invoke the Ping Pong Scheduler on AMD
cla signed
fb-exported
#3703
opened Feb 18, 2025 by
njriasan
Loading…
Refactor stacked version of FP8 Grouped Gemm for reduced overhead
cla signed
fb-exported
#3699
opened Feb 17, 2025 by
jwfromm
Loading…
Add D_folded support for jagged_to_padded_dense_backward meta function
cla signed
fb-exported
#3670
opened Feb 8, 2025 by
brad-mengchi
Loading…
Adding Missing includes and explicitly declaring Tensor in aten namespace.
cla signed
fb-exported
#3638
opened Jan 30, 2025 by
pradeepfn
Loading…
Partial revert of D66986498 (Optimized backward pass for ROCm devices, pt 1), 2nd attempt
ciflow/rocm
cla signed
fb-exported
module: rocm
#3637
opened Jan 29, 2025 by
q10
Loading…
avoid using warning tensor in cpu tbe op
cla signed
fb-exported
#3631
opened Jan 29, 2025 by
842974287
Loading…
Update bf16i4 gemm with new cutlass version
cla signed
fb-exported
#3630
opened Jan 29, 2025 by
jwfromm
Loading…
finish #1808 cherry-pick, adjust interface
cla signed
fb-exported
#3627
opened Jan 28, 2025 by
coconutruben
Loading…
Re-land D67407935 (Optimized backward pass for ROCm devices, pt 2)
ciflow/rocm
cla signed
fb-exported
module: rocm
#3619
opened Jan 27, 2025 by
q10
Loading…
Performance Optimization: Optimized TileShape Configuration for f8
cla signed
#3617
opened Jan 27, 2025 by
MatrixAssembler
Loading…
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.