my_gemm Goals: basic GEMM in cpu GEMM in cuda GEMM optimization TEST and Compare with modern library, such as cublas, cutlass