Skip to content

Implement matmul_veclib() with cblas#800

Open
ThreeMonth03 wants to merge 1 commit into
solvcon:masterfrom
ThreeMonth03:accel_matmul
Open

Implement matmul_veclib() with cblas#800
ThreeMonth03 wants to merge 1 commit into
solvcon:masterfrom
ThreeMonth03:accel_matmul

Conversation

@ThreeMonth03
Copy link
Copy Markdown
Collaborator

@ThreeMonth03 ThreeMonth03 commented May 21, 2026

As issue #789 mentioned, we need to implement and benchmark the speed of matmul_veclib(), which uses accelerate framework. Therefore, this pull request create new api matmul_veclib() and profile the operation time.

The following chart and sheet are profiling results. It shows that the speed of matmul_veclib() outperform numpy library.
image
image

2D x 2D shape: (4, 4) x (4, 4) dtype:float32

func per call (ms) cmp to np
np 4.495E-02 1.000
naive_sa 1.679E-03 0.037
veclib_sa 1.333E-03 0.030
fast_sa_16_16_16 2.271E-03 0.051
fast_sa_32_32_32 1.962E-03 0.044
fast_sa_64_64_64 1.863E-03 0.041

2D x 2D shape: (16, 16) x (16, 16) dtype:float32

func per call (ms) cmp to np
np 2.717E-03 1.000
naive_sa 4.142E-03 1.525
veclib_sa 1.792E-03 0.660
fast_sa_16_16_16 4.408E-03 1.623
fast_sa_32_32_32 3.863E-03 1.422
fast_sa_64_64_64 3.766E-03 1.386

2D x 2D shape: (64, 64) x (64, 64) dtype:float32

func per call (ms) cmp to np
np 3.650E-03 1.000
naive_sa 2.362E-01 64.724
veclib_sa 2.708E-03 0.742
fast_sa_16_16_16 8.988E-02 24.624
fast_sa_32_32_32 1.044E-01 28.597
fast_sa_64_64_64 1.381E-01 37.832

2D x 2D shape: (256, 256) x (256, 256) dtype:float32

func per call (ms) cmp to np
np 6.320E-02 1.000
naive_sa 2.452E+01 387.947
veclib_sa 5.225E-02 0.827
fast_sa_16_16_16 5.240E+00 82.913
fast_sa_32_32_32 6.279E+00 99.350
fast_sa_64_64_64 8.465E+00 133.947

2D x 2D shape: (1024, 1024) x (1024, 1024) dtype:float32

func per call (ms) cmp to np
np 3.183E+00 1.000
naive_sa 2.074E+03 651.681
veclib_sa 3.082E+00 0.968
fast_sa_16_16_16 3.299E+02 103.632
fast_sa_32_32_32 4.126E+02 129.608
fast_sa_64_64_64 5.562E+02 174.738

2D x 2D shape: (4, 4) x (4, 4) dtype:float64

func per call (ms) cmp to np
np 3.775E-03 1.000
naive_sa 1.504E-03 0.398
veclib_sa 1.321E-03 0.350
fast_sa_16_16_16 2.325E-03 0.616
fast_sa_32_32_32 1.942E-03 0.514
fast_sa_64_64_64 1.892E-03 0.501

2D x 2D shape: (16, 16) x (16, 16) dtype:float64

func per call (ms) cmp to np
np 3.471E-03 1.000
naive_sa 4.225E-03 1.217
veclib_sa 1.754E-03 0.505
fast_sa_16_16_16 5.271E-03 1.519
fast_sa_32_32_32 3.950E-03 1.138
fast_sa_64_64_64 3.767E-03 1.085

2D x 2D shape: (64, 64) x (64, 64) dtype:float64

func per call (ms) cmp to np
np 5.554E-03 1.000
naive_sa 2.359E-01 42.471
veclib_sa 5.000E-03 0.900
fast_sa_16_16_16 9.395E-02 16.914
fast_sa_32_32_32 1.167E-01 21.007
fast_sa_64_64_64 1.641E-01 29.552

2D x 2D shape: (256, 256) x (256, 256) dtype:float64

func per call (ms) cmp to np
np 1.548E-01 1.000
naive_sa 2.472E+01 159.666
veclib_sa 1.520E-01 0.982
fast_sa_16_16_16 5.487E+00 35.441
fast_sa_32_32_32 7.064E+00 45.629
fast_sa_64_64_64 1.026E+01 66.257

2D x 2D shape: (1024, 1024) x (1024, 1024) dtype:float64

func per call (ms) cmp to np
np 1.123E+01 1.000
naive_sa 2.187E+03 194.808
veclib_sa 1.126E+01 1.003
fast_sa_16_16_16 4.237E+02 37.749
fast_sa_32_32_32 4.965E+02 44.233
fast_sa_64_64_64 6.784E+02 60.437

2D x 2D shape: (9, 9) x (9, 9) dtype:float32

func per call (ms) cmp to np
np 3.279E-03 1.000
naive_sa 1.896E-03 0.578
veclib_sa 1.437E-03 0.438
fast_sa_16_16_16 2.737E-03 0.835
fast_sa_32_32_32 2.496E-03 0.761
fast_sa_64_64_64 2.433E-03 0.742

2D x 2D shape: (27, 27) x (27, 27) dtype:float32

func per call (ms) cmp to np
np 2.738E-03 1.000
naive_sa 1.384E-02 5.056
veclib_sa 1.992E-03 0.727
fast_sa_16_16_16 1.247E-02 4.554
fast_sa_32_32_32 1.349E-02 4.928
fast_sa_64_64_64 1.343E-02 4.905

2D x 2D shape: (81, 81) x (81, 81) dtype:float32

func per call (ms) cmp to np
np 6.837E-03 1.000
naive_sa 5.737E-01 83.912
veclib_sa 5.875E-03 0.859
fast_sa_16_16_16 1.910E-01 27.936
fast_sa_32_32_32 2.129E-01 31.138
fast_sa_64_64_64 2.641E-01 38.632

2D x 2D shape: (243, 243) x (243, 243) dtype:float32

func per call (ms) cmp to np
np 5.136E-02 1.000
naive_sa 2.031E+01 395.389
veclib_sa 5.108E-02 0.995
fast_sa_16_16_16 4.714E+00 91.776
fast_sa_32_32_32 5.579E+00 108.621
fast_sa_64_64_64 7.386E+00 143.803

2D x 2D shape: (729, 729) x (729, 729) dtype:float32

func per call (ms) cmp to np
np 1.236E+00 1.000
naive_sa 7.211E+02 583.261
veclib_sa 1.066E+00 0.862
fast_sa_16_16_16 1.252E+02 101.236
fast_sa_32_32_32 1.513E+02 122.409
fast_sa_64_64_64 2.010E+02 162.580

2D x 2D shape: (9, 9) x (9, 9) dtype:float64

func per call (ms) cmp to np
np 4.579E-03 1.000
naive_sa 1.942E-03 0.424
veclib_sa 1.546E-03 0.338
fast_sa_16_16_16 2.750E-03 0.601
fast_sa_32_32_32 2.346E-03 0.512
fast_sa_64_64_64 2.309E-03 0.504

2D x 2D shape: (27, 27) x (27, 27) dtype:float64

func per call (ms) cmp to np
np 4.008E-03 1.000
naive_sa 1.445E-02 3.605
veclib_sa 2.642E-03 0.659
fast_sa_16_16_16 1.192E-02 2.974
fast_sa_32_32_32 1.155E-02 2.882
fast_sa_64_64_64 1.160E-02 2.894

2D x 2D shape: (81, 81) x (81, 81) dtype:float64

func per call (ms) cmp to np
np 1.257E-02 1.000
naive_sa 5.741E-01 45.651
veclib_sa 1.296E-02 1.030
fast_sa_16_16_16 2.019E-01 16.059
fast_sa_32_32_32 2.372E-01 18.866
fast_sa_64_64_64 3.165E-01 25.168

2D x 2D shape: (243, 243) x (243, 243) dtype:float64

func per call (ms) cmp to np
np 1.562E-01 1.000
naive_sa 2.040E+01 130.667
veclib_sa 1.531E-01 0.980
fast_sa_16_16_16 4.957E+00 31.745
fast_sa_32_32_32 6.278E+00 40.207
fast_sa_64_64_64 8.717E+00 55.821

2D x 2D shape: (729, 729) x (729, 729) dtype:float64

func per call (ms) cmp to np
np 5.104E+00 1.000
naive_sa 7.320E+02 143.421
veclib_sa 4.762E+00 0.933
fast_sa_16_16_16 1.306E+02 25.584
fast_sa_32_32_32 1.681E+02 32.930
fast_sa_64_64_64 2.379E+02 46.610

@yungyuc
Copy link
Copy Markdown
Member

yungyuc commented May 21, 2026

Please convert this back to draft before leaving inline annotation and have linter clean.

@ThreeMonth03 ThreeMonth03 marked this pull request as draft May 21, 2026 18:26
@ThreeMonth03 ThreeMonth03 force-pushed the accel_matmul branch 3 times, most recently from 54201f8 to a05324b Compare May 22, 2026 04:36
Copy link
Copy Markdown
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc Please take a look.

Comment thread cpp/modmesh/buffer/SimpleArray.hpp Outdated
Comment on lines +292 to +313
/**
* Perform matrix multiplication using Accelerate/CBLAS when available.
*/
template <typename A, typename T>
A SimpleArrayMatmulHelper<A, T>::matmul_veclib()
{
if (m_lhs.ndim() == 1 && m_rhs.ndim() == 1)
{
return matmul_vec_vec();
}
if (m_lhs.ndim() == 1)
{
return matmul_vec_mat();
}
if (m_rhs.ndim() == 1)
{
return matmul_mat_vec();
}

return matmul_mat_mat_veclib();
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create new api interface.

Comment thread cpp/modmesh/buffer/SimpleArray.hpp Outdated
Comment on lines +292 to +313
/**
* Perform matrix multiplication using Accelerate/CBLAS when available.
*/
template <typename A, typename T>
A SimpleArrayMatmulHelper<A, T>::matmul_veclib()
{
if (m_lhs.ndim() == 1 && m_rhs.ndim() == 1)
{
return matmul_vec_vec();
}
if (m_lhs.ndim() == 1)
{
return matmul_vec_mat();
}
if (m_rhs.ndim() == 1)
{
return matmul_mat_vec();
}

return matmul_mat_mat_veclib();
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create new api interface.

Comment thread cpp/modmesh/buffer/SimpleArray.hpp Outdated
Comment on lines +453 to +467
template <typename A, typename T>
A SimpleArrayMatmulHelper<A, T>::matmul_mat_mat_veclib()
{
if (!m_lhs.is_c_contiguous() || !m_rhs.is_c_contiguous())
{
return matmul_mat_mat();
}

size_t const m = m_result.shape(0);
size_t const n = m_result.shape(1);
size_t const k = m_lhs.shape(1);
simd::matmul(m, n, k, m_lhs.data(), m_rhs.data(), m_result.data());
return std::move(m_result);
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only consider accelerate library when SimpleArray are c-contiguous.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Comment on lines +265 to +283
template <typename T>
inline constexpr bool is_std_complex_layout_compatible_v = std::is_standard_layout_v<Complex<T>> &&
sizeof(Complex<T>) == sizeof(std::complex<T>) &&
alignof(Complex<T>) == alignof(std::complex<T>);

static_assert(is_std_complex_layout_compatible_v<float>);
static_assert(is_std_complex_layout_compatible_v<double>);

template <typename T>
std::complex<T> const * as_std_complex_pointer(Complex<T> const * ptr)
{
return reinterpret_cast<std::complex<T> const *>(ptr); // NOLINT(cppcoreguidelines-pro-type-reinterpret-cast)
}

template <typename T>
std::complex<T> * as_std_complex_pointer(Complex<T> * ptr)
{
return reinterpret_cast<std::complex<T> *>(ptr); // NOLINT(cppcoreguidelines-pro-type-reinterpret-cast)
}
Copy link
Copy Markdown
Collaborator Author

@ThreeMonth03 ThreeMonth03 May 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to support matrix multiplication of Complex, it is necessary to ensure the layout of std::complex<T> and Complex<T> are aligned.
Additionally, it is allowed to reinterpret cast from Complex<T> to std::complex<T>.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: it sounds like that our own Complex is unnecessary and may directly use std::complex?

@j8xixo12 , what do you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can directly use std::complex, it should reduce the maintaining efforts.

Copy link
Copy Markdown
Collaborator Author

@ThreeMonth03 ThreeMonth03 May 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference of api between std::complex and Complex is comparison operators. We implement comparison operators based on lexicographic order in Complex, but there are not any comparison operators in std::complex.

Without comparison operators, some apis, like min() or max() will not support complex type. Therefore, we might create some helper functions if Complex is substituted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, having Complex allows us customize operators without worrying about compatibility to other code that uses the STL std::complex. It is a big win.

Let's keep our Complex for a while.


#include <modmesh/simd/accelerate/accelerate.hpp>

#if defined(__APPLE__) && defined(__arm64__)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file would be compiled if the device is macos with arm64 architecture.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not right to put it under simd/ because we did not add SIMD code for it. It is a vendor library. Use a distinct file under buffer/ for it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur.

Comment thread cpp/modmesh/simd/gemm.hpp Outdated
Comment on lines +46 to +70
enum class GemmBackend : uint8_t
{
Generic,
Accelerate,
};

template <typename T>
inline constexpr GemmBackend gemm_backend_v = accelerate::supports_matmul_v<T>
? GemmBackend::Accelerate
: GemmBackend::Generic;

} /* namespace detail */

template <typename T>
void matmul(size_t m, size_t n, size_t k, T const * lhs, T const * rhs, T * result)
{
if constexpr (detail::gemm_backend_v<T> == detail::GemmBackend::Accelerate)
{
detail::accelerate::matmul(m, n, k, lhs, rhs, result);
}
else
{
generic::matmul(m, n, k, lhs, rhs, result);
}
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function would determine the backend.

(std::is_same_v<T, float> ||
std::is_same_v<T, double> ||
std::is_same_v<T, Complex<float>> ||
std::is_same_v<T, Complex<double>>);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cblas only supports these type.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. They are what we need for now.

Comment thread cpp/modmesh/simd/CMakeLists.txt Outdated
Comment on lines +32 to +39
${CMAKE_CURRENT_SOURCE_DIR}/gemm.hpp
${CMAKE_CURRENT_SOURCE_DIR}/gemm_generic.hpp
${CMAKE_CURRENT_SOURCE_DIR}/accelerate/accelerate.hpp
CACHE FILEPATH "" FORCE)

set(MODMESH_SIMD_SOURCES
${CMAKE_CURRENT_SOURCE_DIR}/simd_support.cpp
${CMAKE_CURRENT_SOURCE_DIR}/accelerate/accelerate.cpp
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pathnames and filenames are unreasonable, but I have to idea where to put these files.

Comment thread tests/test_matrix.py
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test the correctness of matmul_veclib().

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profile matmul_veclib().

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@ThreeMonth03 ThreeMonth03 marked this pull request as ready for review May 22, 2026 07:34
@ThreeMonth03
Copy link
Copy Markdown
Collaborator Author

ThreeMonth03 commented May 22, 2026

By the way, I decide to check the latency of profiler if I'm available, because the profiling result looks weird when numpy array is small. If there is any update, I will make a pull request.

Copy link
Copy Markdown
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Keep the wrappers matmul, matmul_veclib, and matmul_fast close.
  • Discussion: Maybe imatmul_fast should also be wrapped to Python?
  • Discussion: Do we need our own Complex or should directly use std::complex?
  • Use a distinct file under buffer/ for the veclib matmul wrapper.
  • Add a FIXME and a follow-up issue to track fixing silently passed failures for missing veclib.

{ return self.div(scalar); })
.def("matmul", &wrapped_type::matmul)
.def("__matmul__", &wrapped_type::matmul)
.def("matmul_veclib", &wrapped_type::matmul_veclib)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good name (matmul_veclib). It should be placed after matmul, and fast_matmul should be renamed as matmul_fast for consistent names.

{ self.idiv(scalar); })
.def("imatmul", [](wrapped_type & self, wrapped_type const & other)
{ self.imatmul(other); })
.def("imatmul_veclib", [](wrapped_type & self, wrapped_type const & other)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: Maybe imatmul_fast should also be wrapped to Python?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be wrapped to Python. We have implemented this api.

Comment thread cpp/modmesh/buffer/SimpleArray.hpp Outdated
Comment on lines +453 to +467
template <typename A, typename T>
A SimpleArrayMatmulHelper<A, T>::matmul_mat_mat_veclib()
{
if (!m_lhs.is_c_contiguous() || !m_rhs.is_c_contiguous())
{
return matmul_mat_mat();
}

size_t const m = m_result.shape(0);
size_t const n = m_result.shape(1);
size_t const k = m_lhs.shape(1);
simd::matmul(m, n, k, m_lhs.data(), m_rhs.data(), m_result.data());
return std::move(m_result);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

Comment on lines +265 to +283
template <typename T>
inline constexpr bool is_std_complex_layout_compatible_v = std::is_standard_layout_v<Complex<T>> &&
sizeof(Complex<T>) == sizeof(std::complex<T>) &&
alignof(Complex<T>) == alignof(std::complex<T>);

static_assert(is_std_complex_layout_compatible_v<float>);
static_assert(is_std_complex_layout_compatible_v<double>);

template <typename T>
std::complex<T> const * as_std_complex_pointer(Complex<T> const * ptr)
{
return reinterpret_cast<std::complex<T> const *>(ptr); // NOLINT(cppcoreguidelines-pro-type-reinterpret-cast)
}

template <typename T>
std::complex<T> * as_std_complex_pointer(Complex<T> * ptr)
{
return reinterpret_cast<std::complex<T> *>(ptr); // NOLINT(cppcoreguidelines-pro-type-reinterpret-cast)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion: it sounds like that our own Complex is unnecessary and may directly use std::complex?

@j8xixo12 , what do you think?


#include <modmesh/simd/accelerate/accelerate.hpp>

#if defined(__APPLE__) && defined(__arm64__)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not right to put it under simd/ because we did not add SIMD code for it. It is a vendor library. Use a distinct file under buffer/ for it.

(std::is_same_v<T, float> ||
std::is_same_v<T, double> ||
std::is_same_v<T, Complex<float>> ||
std::is_same_v<T, Complex<double>>);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. They are what we need for now.

Comment thread cpp/modmesh/simd/gemm.hpp Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Files you added under simd/ should all go to buffer/.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Comment thread tests/test_matrix.py
try:
veclib_result = lhs.matmul_veclib(rhs)
except RuntimeError as exc:
self.assertEqual(str(exc), veclib_unavailable)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky. Failures are silently passed. When there is not veclib the tests should be marked with expected failures, but non-veclib tests should pass. That is, we will need distinct test functions only for veclib.

Please add a FIXME here and create a follow-up issue to track the fix.

@yungyuc yungyuc added the array Multi-dimensional array implementation label May 22, 2026
@yungyuc yungyuc moved this from Todo to In Progress in tensor operations May 22, 2026
@yungyuc yungyuc requested review from KHLee529 and j8xixo12 May 22, 2026 12:20
@ThreeMonth03 ThreeMonth03 force-pushed the accel_matmul branch 3 times, most recently from ac1981f to 188ce0e Compare May 24, 2026 15:10
Copy link
Copy Markdown
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Keep the wrappers matmul, matmul_veclib, and matmul_fast close.
  • Discussion: Maybe imatmul_fast should also be wrapped to Python?
  • Discussion: Do we need our own Complex or should directly use std::complex?
  • Use a distinct file under buffer/ for the veclib matmul wrapper.
  • Add a FIXME and a follow-up issue to track fixing silently passed failures for missing veclib.

@yungyuc Please take a look. Thanks.

{ self.idiv(scalar); })
.def("imatmul", [](wrapped_type & self, wrapped_type const & other)
{ self.imatmul(other); })
.def("imatmul_veclib", [](wrapped_type & self, wrapped_type const & other)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be wrapped to Python. We have implemented this api.

Comment on lines 381 to +384
.def("matmul", &wrapped_type::matmul)
.def("__matmul__", &wrapped_type::matmul)
.def("matmul_veclib", &wrapped_type::matmul_veclib)
.def(
"fast_matmul",
"matmul_fast",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename matmul_fast and replace apis.

Comment on lines +439 to +450
.def(
"imatmul_fast",
[](wrapped_type & self,
wrapped_type const & other,
size_t tile_x,
size_t tile_y,
size_t tile_z)
{ self.imatmul_fast(other, tile_x, tile_y, tile_z); },
py::arg("other"),
py::arg("tile_x") = 16,
py::arg("tile_y") = 16,
py::arg("tile_z") = 16)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrapped imatmul_fast to python now.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move helper class SimpleArrayMatmulHelper to matmul.hpp.

Comment thread tests/test_matrix.py
Comment on lines +95 to +100
except RuntimeError as exc:
# FIXME: Split veclib backend coverage into dedicated tests and
# mark unsupported platforms as expected failures once a follow-up
# issue is filed.
self.assertEqual(str(exc), veclib_unavailable)
return
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add FIXME comment temporaily. I would open a new issue later.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, matmul_veclib has been modified. It would be acted as matmul if it is not used on apple platform. I'm not sure whether we need try/except now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

array Multi-dimensional array implementation

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

4 participants