Implement matmul_veclib() with cblas by ThreeMonth03 · Pull Request #800 · solvcon/modmesh

ThreeMonth03 · 2026-05-21T14:17:36Z

As issue #789 mentioned, we need to implement and benchmark the speed of matmul_veclib(), which uses accelerate framework. Therefore, this pull request create new api matmul_veclib() and profile the operation time.

The following chart and sheet are profiling results. It shows that the speed of matmul_veclib() outperform numpy library.

2D x 2D shape: (4, 4) x (4, 4) dtype:`float32`

func	per call (ms)	cmp to np
np	4.495E-02	1.000
naive_sa	1.679E-03	0.037
veclib_sa	1.333E-03	0.030
fast_sa_16_16_16	2.271E-03	0.051
fast_sa_32_32_32	1.962E-03	0.044
fast_sa_64_64_64	1.863E-03	0.041

2D x 2D shape: (16, 16) x (16, 16) dtype:`float32`

func	per call (ms)	cmp to np
np	2.717E-03	1.000
naive_sa	4.142E-03	1.525
veclib_sa	1.792E-03	0.660
fast_sa_16_16_16	4.408E-03	1.623
fast_sa_32_32_32	3.863E-03	1.422
fast_sa_64_64_64	3.766E-03	1.386

2D x 2D shape: (64, 64) x (64, 64) dtype:`float32`

func	per call (ms)	cmp to np
np	3.650E-03	1.000
naive_sa	2.362E-01	64.724
veclib_sa	2.708E-03	0.742
fast_sa_16_16_16	8.988E-02	24.624
fast_sa_32_32_32	1.044E-01	28.597
fast_sa_64_64_64	1.381E-01	37.832

2D x 2D shape: (256, 256) x (256, 256) dtype:`float32`

func	per call (ms)	cmp to np
np	6.320E-02	1.000
naive_sa	2.452E+01	387.947
veclib_sa	5.225E-02	0.827
fast_sa_16_16_16	5.240E+00	82.913
fast_sa_32_32_32	6.279E+00	99.350
fast_sa_64_64_64	8.465E+00	133.947

2D x 2D shape: (1024, 1024) x (1024, 1024) dtype:`float32`

func	per call (ms)	cmp to np
np	3.183E+00	1.000
naive_sa	2.074E+03	651.681
veclib_sa	3.082E+00	0.968
fast_sa_16_16_16	3.299E+02	103.632
fast_sa_32_32_32	4.126E+02	129.608
fast_sa_64_64_64	5.562E+02	174.738

2D x 2D shape: (4, 4) x (4, 4) dtype:`float64`

func	per call (ms)	cmp to np
np	3.775E-03	1.000
naive_sa	1.504E-03	0.398
veclib_sa	1.321E-03	0.350
fast_sa_16_16_16	2.325E-03	0.616
fast_sa_32_32_32	1.942E-03	0.514
fast_sa_64_64_64	1.892E-03	0.501

2D x 2D shape: (16, 16) x (16, 16) dtype:`float64`

func	per call (ms)	cmp to np
np	3.471E-03	1.000
naive_sa	4.225E-03	1.217
veclib_sa	1.754E-03	0.505
fast_sa_16_16_16	5.271E-03	1.519
fast_sa_32_32_32	3.950E-03	1.138
fast_sa_64_64_64	3.767E-03	1.085

2D x 2D shape: (64, 64) x (64, 64) dtype:`float64`

func	per call (ms)	cmp to np
np	5.554E-03	1.000
naive_sa	2.359E-01	42.471
veclib_sa	5.000E-03	0.900
fast_sa_16_16_16	9.395E-02	16.914
fast_sa_32_32_32	1.167E-01	21.007
fast_sa_64_64_64	1.641E-01	29.552

2D x 2D shape: (256, 256) x (256, 256) dtype:`float64`

func	per call (ms)	cmp to np
np	1.548E-01	1.000
naive_sa	2.472E+01	159.666
veclib_sa	1.520E-01	0.982
fast_sa_16_16_16	5.487E+00	35.441
fast_sa_32_32_32	7.064E+00	45.629
fast_sa_64_64_64	1.026E+01	66.257

2D x 2D shape: (1024, 1024) x (1024, 1024) dtype:`float64`

func	per call (ms)	cmp to np
np	1.123E+01	1.000
naive_sa	2.187E+03	194.808
veclib_sa	1.126E+01	1.003
fast_sa_16_16_16	4.237E+02	37.749
fast_sa_32_32_32	4.965E+02	44.233
fast_sa_64_64_64	6.784E+02	60.437

2D x 2D shape: (9, 9) x (9, 9) dtype:`float32`

func	per call (ms)	cmp to np
np	3.279E-03	1.000
naive_sa	1.896E-03	0.578
veclib_sa	1.437E-03	0.438
fast_sa_16_16_16	2.737E-03	0.835
fast_sa_32_32_32	2.496E-03	0.761
fast_sa_64_64_64	2.433E-03	0.742

2D x 2D shape: (27, 27) x (27, 27) dtype:`float32`

func	per call (ms)	cmp to np
np	2.738E-03	1.000
naive_sa	1.384E-02	5.056
veclib_sa	1.992E-03	0.727
fast_sa_16_16_16	1.247E-02	4.554
fast_sa_32_32_32	1.349E-02	4.928
fast_sa_64_64_64	1.343E-02	4.905

2D x 2D shape: (81, 81) x (81, 81) dtype:`float32`

func	per call (ms)	cmp to np
np	6.837E-03	1.000
naive_sa	5.737E-01	83.912
veclib_sa	5.875E-03	0.859
fast_sa_16_16_16	1.910E-01	27.936
fast_sa_32_32_32	2.129E-01	31.138
fast_sa_64_64_64	2.641E-01	38.632

2D x 2D shape: (243, 243) x (243, 243) dtype:`float32`

func	per call (ms)	cmp to np
np	5.136E-02	1.000
naive_sa	2.031E+01	395.389
veclib_sa	5.108E-02	0.995
fast_sa_16_16_16	4.714E+00	91.776
fast_sa_32_32_32	5.579E+00	108.621
fast_sa_64_64_64	7.386E+00	143.803

2D x 2D shape: (729, 729) x (729, 729) dtype:`float32`

func	per call (ms)	cmp to np
np	1.236E+00	1.000
naive_sa	7.211E+02	583.261
veclib_sa	1.066E+00	0.862
fast_sa_16_16_16	1.252E+02	101.236
fast_sa_32_32_32	1.513E+02	122.409
fast_sa_64_64_64	2.010E+02	162.580

2D x 2D shape: (9, 9) x (9, 9) dtype:`float64`

func	per call (ms)	cmp to np
np	4.579E-03	1.000
naive_sa	1.942E-03	0.424
veclib_sa	1.546E-03	0.338
fast_sa_16_16_16	2.750E-03	0.601
fast_sa_32_32_32	2.346E-03	0.512
fast_sa_64_64_64	2.309E-03	0.504

2D x 2D shape: (27, 27) x (27, 27) dtype:`float64`

func	per call (ms)	cmp to np
np	4.008E-03	1.000
naive_sa	1.445E-02	3.605
veclib_sa	2.642E-03	0.659
fast_sa_16_16_16	1.192E-02	2.974
fast_sa_32_32_32	1.155E-02	2.882
fast_sa_64_64_64	1.160E-02	2.894

2D x 2D shape: (81, 81) x (81, 81) dtype:`float64`

func	per call (ms)	cmp to np
np	1.257E-02	1.000
naive_sa	5.741E-01	45.651
veclib_sa	1.296E-02	1.030
fast_sa_16_16_16	2.019E-01	16.059
fast_sa_32_32_32	2.372E-01	18.866
fast_sa_64_64_64	3.165E-01	25.168

2D x 2D shape: (243, 243) x (243, 243) dtype:`float64`

func	per call (ms)	cmp to np
np	1.562E-01	1.000
naive_sa	2.040E+01	130.667
veclib_sa	1.531E-01	0.980
fast_sa_16_16_16	4.957E+00	31.745
fast_sa_32_32_32	6.278E+00	40.207
fast_sa_64_64_64	8.717E+00	55.821

2D x 2D shape: (729, 729) x (729, 729) dtype:`float64`

func	per call (ms)	cmp to np
np	5.104E+00	1.000
naive_sa	7.320E+02	143.421
veclib_sa	4.762E+00	0.933
fast_sa_16_16_16	1.306E+02	25.584
fast_sa_32_32_32	1.681E+02	32.930
fast_sa_64_64_64	2.379E+02	46.610

yungyuc · 2026-05-21T15:06:11Z

Please convert this back to draft before leaving inline annotation and have linter clean.

ThreeMonth03

@yungyuc Please take a look.

ThreeMonth03 · 2026-05-22T07:06:17Z

+/**
+ * Perform matrix multiplication using Accelerate/CBLAS when available.
+ */
+template <typename A, typename T>
+A SimpleArrayMatmulHelper<A, T>::matmul_veclib()
+{
+    if (m_lhs.ndim() == 1 && m_rhs.ndim() == 1)
+    {
+        return matmul_vec_vec();
+    }
+    if (m_lhs.ndim() == 1)
+    {
+        return matmul_vec_mat();
+    }
+    if (m_rhs.ndim() == 1)
+    {
+        return matmul_mat_vec();
+    }
+
+    return matmul_mat_mat_veclib();
+}
+


Create new api interface.

ThreeMonth03 · 2026-05-22T07:06:20Z

+/**
+ * Perform matrix multiplication using Accelerate/CBLAS when available.
+ */
+template <typename A, typename T>
+A SimpleArrayMatmulHelper<A, T>::matmul_veclib()
+{
+    if (m_lhs.ndim() == 1 && m_rhs.ndim() == 1)
+    {
+        return matmul_vec_vec();
+    }
+    if (m_lhs.ndim() == 1)
+    {
+        return matmul_vec_mat();
+    }
+    if (m_rhs.ndim() == 1)
+    {
+        return matmul_mat_vec();
+    }
+
+    return matmul_mat_mat_veclib();
+}
+


Create new api interface.

ThreeMonth03 · 2026-05-22T07:08:18Z

+template <typename A, typename T>
+A SimpleArrayMatmulHelper<A, T>::matmul_mat_mat_veclib()
+{
+    if (!m_lhs.is_c_contiguous() || !m_rhs.is_c_contiguous())
+    {
+        return matmul_mat_mat();
+    }
+
+    size_t const m = m_result.shape(0);
+    size_t const n = m_result.shape(1);
+    size_t const k = m_lhs.shape(1);
+    simd::matmul(m, n, k, m_lhs.data(), m_rhs.data(), m_result.data());
+    return std::move(m_result);
+}
+


Only consider accelerate library when SimpleArray are c-contiguous.

Good point.

ThreeMonth03 · 2026-05-22T07:13:04Z

+template <typename T>
+inline constexpr bool is_std_complex_layout_compatible_v = std::is_standard_layout_v<Complex<T>> &&
+                                                           sizeof(Complex<T>) == sizeof(std::complex<T>) &&
+                                                           alignof(Complex<T>) == alignof(std::complex<T>);
+
+static_assert(is_std_complex_layout_compatible_v<float>);
+static_assert(is_std_complex_layout_compatible_v<double>);
+
+template <typename T>
+std::complex<T> const * as_std_complex_pointer(Complex<T> const * ptr)
+{
+    return reinterpret_cast<std::complex<T> const *>(ptr); // NOLINT(cppcoreguidelines-pro-type-reinterpret-cast)
+}
+
+template <typename T>
+std::complex<T> * as_std_complex_pointer(Complex<T> * ptr)
+{
+    return reinterpret_cast<std::complex<T> *>(ptr); // NOLINT(cppcoreguidelines-pro-type-reinterpret-cast)
+}


In order to support matrix multiplication of Complex, it is necessary to ensure the layout of std::complex<T> and Complex<T> are aligned.
Additionally, it is allowed to reinterpret cast from Complex<T> to std::complex<T>.

Discussion: it sounds like that our own Complex is unnecessary and may directly use std::complex?

@j8xixo12 , what do you think?

I think we can directly use std::complex, it should reduce the maintaining efforts.

The difference of api between std::complex and Complex is comparison operators. We implement comparison operators based on lexicographic order in Complex, but there are not any comparison operators in std::complex.

Without comparison operators, some apis, like min() or max() will not support complex type. Therefore, we might create some helper functions if Complex is substituted.

Right, having Complex allows us customize operators without worrying about compatibility to other code that uses the STL std::complex. It is a big win.

Let's keep our Complex for a while.

ThreeMonth03 · 2026-05-22T07:15:30Z

+
+#include <modmesh/simd/accelerate/accelerate.hpp>
+
+#if defined(__APPLE__) && defined(__arm64__)


This file would be compiled if the device is macos with arm64 architecture.

It is not right to put it under simd/ because we did not add SIMD code for it. It is a vendor library. Use a distinct file under buffer/ for it.

ThreeMonth03 · 2026-05-22T07:17:51Z

+enum class GemmBackend : uint8_t
+{
+    Generic,
+    Accelerate,
+};
+
+template <typename T>
+inline constexpr GemmBackend gemm_backend_v = accelerate::supports_matmul_v<T>
+                                                  ? GemmBackend::Accelerate
+                                                  : GemmBackend::Generic;
+
+} /* namespace detail */
+
+template <typename T>
+void matmul(size_t m, size_t n, size_t k, T const * lhs, T const * rhs, T * result)
+{
+    if constexpr (detail::gemm_backend_v<T> == detail::GemmBackend::Accelerate)
+    {
+        detail::accelerate::matmul(m, n, k, lhs, rhs, result);
+    }
+    else
+    {
+        generic::matmul(m, n, k, lhs, rhs, result);
+    }
+}


The function would determine the backend.

ThreeMonth03 · 2026-05-22T07:22:13Z

+                                          (std::is_same_v<T, float> ||
+                                           std::is_same_v<T, double> ||
+                                           std::is_same_v<T, Complex<float>> ||
+                                           std::is_same_v<T, Complex<double>>);


cblas only supports these type.

Yes. They are what we need for now.

ThreeMonth03 · 2026-05-22T07:31:59Z

+    ${CMAKE_CURRENT_SOURCE_DIR}/gemm.hpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/gemm_generic.hpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/accelerate/accelerate.hpp
    CACHE FILEPATH "" FORCE)

 set(MODMESH_SIMD_SOURCES
    ${CMAKE_CURRENT_SOURCE_DIR}/simd_support.cpp
+    ${CMAKE_CURRENT_SOURCE_DIR}/accelerate/accelerate.cpp


The pathnames and filenames are unreasonable, but I have to idea where to put these files.

ThreeMonth03 · 2026-05-22T07:32:45Z

Test the correctness of matmul_veclib().

ThreeMonth03 · 2026-05-22T07:32:59Z

Profile matmul_veclib().

ThreeMonth03 · 2026-05-22T07:47:15Z

By the way, I decide to check the latency of profiler if I'm available, because the profiling result looks weird when numpy array is small. If there is any update, I will make a pull request.

yungyuc

Keep the wrappers matmul, matmul_veclib, and matmul_fast close.
Discussion: Maybe imatmul_fast should also be wrapped to Python?
Discussion: Do we need our own Complex or should directly use std::complex?
Use a distinct file under buffer/ for the veclib matmul wrapper.
Add a FIXME and a follow-up issue to track fixing silently passed failures for missing veclib.

yungyuc · 2026-05-22T12:07:35Z

                { return self.div(scalar); })
            .def("matmul", &wrapped_type::matmul)
            .def("__matmul__", &wrapped_type::matmul)
+            .def("matmul_veclib", &wrapped_type::matmul_veclib)


Good name (matmul_veclib). It should be placed after matmul, and fast_matmul should be renamed as matmul_fast for consistent names.

yungyuc · 2026-05-22T12:09:02Z

                { self.idiv(scalar); })
            .def("imatmul", [](wrapped_type & self, wrapped_type const & other)
                 { self.imatmul(other); })
+            .def("imatmul_veclib", [](wrapped_type & self, wrapped_type const & other)


Discussion: Maybe imatmul_fast should also be wrapped to Python?

Yes, it should be wrapped to Python. We have implemented this api.

yungyuc · 2026-05-22T12:09:42Z

+template <typename A, typename T>
+A SimpleArrayMatmulHelper<A, T>::matmul_mat_mat_veclib()
+{
+    if (!m_lhs.is_c_contiguous() || !m_rhs.is_c_contiguous())
+    {
+        return matmul_mat_mat();
+    }
+
+    size_t const m = m_result.shape(0);
+    size_t const n = m_result.shape(1);
+    size_t const k = m_lhs.shape(1);
+    simd::matmul(m, n, k, m_lhs.data(), m_rhs.data(), m_result.data());
+    return std::move(m_result);
+}
+


Good point.

yungyuc · 2026-05-22T12:11:02Z

+template <typename T>
+inline constexpr bool is_std_complex_layout_compatible_v = std::is_standard_layout_v<Complex<T>> &&
+                                                           sizeof(Complex<T>) == sizeof(std::complex<T>) &&
+                                                           alignof(Complex<T>) == alignof(std::complex<T>);
+
+static_assert(is_std_complex_layout_compatible_v<float>);
+static_assert(is_std_complex_layout_compatible_v<double>);
+
+template <typename T>
+std::complex<T> const * as_std_complex_pointer(Complex<T> const * ptr)
+{
+    return reinterpret_cast<std::complex<T> const *>(ptr); // NOLINT(cppcoreguidelines-pro-type-reinterpret-cast)
+}
+
+template <typename T>
+std::complex<T> * as_std_complex_pointer(Complex<T> * ptr)
+{
+    return reinterpret_cast<std::complex<T> *>(ptr); // NOLINT(cppcoreguidelines-pro-type-reinterpret-cast)
+}


Discussion: it sounds like that our own Complex is unnecessary and may directly use std::complex?

@j8xixo12 , what do you think?

yungyuc · 2026-05-22T12:13:36Z

+
+#include <modmesh/simd/accelerate/accelerate.hpp>
+
+#if defined(__APPLE__) && defined(__arm64__)


It is not right to put it under simd/ because we did not add SIMD code for it. It is a vendor library. Use a distinct file under buffer/ for it.

yungyuc · 2026-05-22T12:14:49Z

+                                          (std::is_same_v<T, float> ||
+                                           std::is_same_v<T, double> ||
+                                           std::is_same_v<T, Complex<float>> ||
+                                           std::is_same_v<T, Complex<double>>);


Yes. They are what we need for now.

yungyuc · 2026-05-22T12:15:24Z

Files you added under simd/ should all go to buffer/.

yungyuc · 2026-05-22T12:16:03Z

yungyuc · 2026-05-22T12:18:48Z

+        try:
+            veclib_result = lhs.matmul_veclib(rhs)
+        except RuntimeError as exc:
+            self.assertEqual(str(exc), veclib_unavailable)


This is tricky. Failures are silently passed. When there is not veclib the tests should be marked with expected failures, but non-veclib tests should pass. That is, we will need distinct test functions only for veclib.

Please add a FIXME here and create a follow-up issue to track the fix.

ThreeMonth03

Keep the wrappers matmul, matmul_veclib, and matmul_fast close.
Discussion: Maybe imatmul_fast should also be wrapped to Python?
Discussion: Do we need our own Complex or should directly use std::complex?
Use a distinct file under buffer/ for the veclib matmul wrapper.
Add a FIXME and a follow-up issue to track fixing silently passed failures for missing veclib.

@yungyuc Please take a look. Thanks.

ThreeMonth03 · 2026-05-24T14:02:54Z

                { self.idiv(scalar); })
            .def("imatmul", [](wrapped_type & self, wrapped_type const & other)
                 { self.imatmul(other); })
+            .def("imatmul_veclib", [](wrapped_type & self, wrapped_type const & other)


Yes, it should be wrapped to Python. We have implemented this api.

ThreeMonth03 · 2026-05-24T15:30:29Z

            .def("matmul", &wrapped_type::matmul)
-            .def("__matmul__", &wrapped_type::matmul)
+            .def("matmul_veclib", &wrapped_type::matmul_veclib)
            .def(
-                "fast_matmul",
+                "matmul_fast",


Rename matmul_fast and replace apis.

ThreeMonth03 · 2026-05-24T15:31:32Z

+            .def(
+                "imatmul_fast",
+                [](wrapped_type & self,
+                   wrapped_type const & other,
+                   size_t tile_x,
+                   size_t tile_y,
+                   size_t tile_z)
+                { self.imatmul_fast(other, tile_x, tile_y, tile_z); },
+                py::arg("other"),
+                py::arg("tile_x") = 16,
+                py::arg("tile_y") = 16,
+                py::arg("tile_z") = 16)


I wrapped imatmul_fast to python now.

ThreeMonth03 · 2026-05-24T15:32:13Z

Move helper class SimpleArrayMatmulHelper to matmul.hpp.

ThreeMonth03 · 2026-05-24T15:33:35Z

+        except RuntimeError as exc:
+            # FIXME: Split veclib backend coverage into dedicated tests and
+            # mark unsupported platforms as expected failures once a follow-up
+            # issue is filed.
+            self.assertEqual(str(exc), veclib_unavailable)
+            return


Add FIXME comment temporaily. I would open a new issue later.

By the way, matmul_veclib has been modified. It would be acted as matmul if it is not used on apple platform. I'm not sure whether we need try/except now.

ThreeMonth03 force-pushed the accel_matmul branch from 20023ff to c91cecd Compare May 21, 2026 14:29

ThreeMonth03 marked this pull request as draft May 21, 2026 18:26

ThreeMonth03 force-pushed the accel_matmul branch 3 times, most recently from 54201f8 to a05324b Compare May 22, 2026 04:36

ThreeMonth03 commented May 22, 2026

View reviewed changes

ThreeMonth03 marked this pull request as ready for review May 22, 2026 07:34

yungyuc requested changes May 22, 2026

View reviewed changes

yungyuc assigned ThreeMonth03 May 22, 2026

yungyuc added the array Multi-dimensional array implementation label May 22, 2026

yungyuc added this to tensor operations May 22, 2026

github-project-automation Bot moved this to Todo in tensor operations May 22, 2026

yungyuc moved this from Todo to In Progress in tensor operations May 22, 2026

yungyuc requested review from KHLee529 and j8xixo12 May 22, 2026 12:20

ThreeMonth03 force-pushed the accel_matmul branch 3 times, most recently from ac1981f to 188ce0e Compare May 24, 2026 15:10

ThreeMonth03 commented May 24, 2026

View reviewed changes

ThreeMonth03 mentioned this pull request May 24, 2026

Refactor testcase in pull request#800 to ensure expected failures on different platform #827

Closed

ThreeMonth03 force-pushed the accel_matmul branch from 188ce0e to 4f467d9 Compare May 24, 2026 15:55

Implement matmul_veclib() with cblas

663a693

ThreeMonth03 force-pushed the accel_matmul branch from 4f467d9 to 663a693 Compare May 24, 2026 16:56


		#include <modmesh/simd/accelerate/accelerate.hpp>

		#if defined(__APPLE__) && defined(__arm64__)

Conversation

ThreeMonth03 commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

2D x 2D shape: (4, 4) x (4, 4) dtype:float32

2D x 2D shape: (16, 16) x (16, 16) dtype:float32

2D x 2D shape: (64, 64) x (64, 64) dtype:float32

2D x 2D shape: (256, 256) x (256, 256) dtype:float32

2D x 2D shape: (1024, 1024) x (1024, 1024) dtype:float32

2D x 2D shape: (4, 4) x (4, 4) dtype:float64

2D x 2D shape: (16, 16) x (16, 16) dtype:float64

2D x 2D shape: (64, 64) x (64, 64) dtype:float64

2D x 2D shape: (256, 256) x (256, 256) dtype:float64

2D x 2D shape: (1024, 1024) x (1024, 1024) dtype:float64

2D x 2D shape: (9, 9) x (9, 9) dtype:float32

2D x 2D shape: (27, 27) x (27, 27) dtype:float32

2D x 2D shape: (81, 81) x (81, 81) dtype:float32

2D x 2D shape: (243, 243) x (243, 243) dtype:float32

2D x 2D shape: (729, 729) x (729, 729) dtype:float32

2D x 2D shape: (9, 9) x (9, 9) dtype:float64

2D x 2D shape: (27, 27) x (27, 27) dtype:float64

2D x 2D shape: (81, 81) x (81, 81) dtype:float64

2D x 2D shape: (243, 243) x (243, 243) dtype:float64

2D x 2D shape: (729, 729) x (729, 729) dtype:float64

Uh oh!

yungyuc commented May 21, 2026

Uh oh!

ThreeMonth03 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThreeMonth03 May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThreeMonth03 May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThreeMonth03 commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yungyuc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ThreeMonth03 commented May 21, 2026 •

edited

Loading

2D x 2D shape: (4, 4) x (4, 4) dtype:`float32`

2D x 2D shape: (16, 16) x (16, 16) dtype:`float32`

2D x 2D shape: (64, 64) x (64, 64) dtype:`float32`

2D x 2D shape: (256, 256) x (256, 256) dtype:`float32`

2D x 2D shape: (1024, 1024) x (1024, 1024) dtype:`float32`

2D x 2D shape: (4, 4) x (4, 4) dtype:`float64`

2D x 2D shape: (16, 16) x (16, 16) dtype:`float64`

2D x 2D shape: (64, 64) x (64, 64) dtype:`float64`

2D x 2D shape: (256, 256) x (256, 256) dtype:`float64`

2D x 2D shape: (1024, 1024) x (1024, 1024) dtype:`float64`

2D x 2D shape: (9, 9) x (9, 9) dtype:`float32`

2D x 2D shape: (27, 27) x (27, 27) dtype:`float32`

2D x 2D shape: (81, 81) x (81, 81) dtype:`float32`

2D x 2D shape: (243, 243) x (243, 243) dtype:`float32`

2D x 2D shape: (729, 729) x (729, 729) dtype:`float32`

2D x 2D shape: (9, 9) x (9, 9) dtype:`float64`

2D x 2D shape: (27, 27) x (27, 27) dtype:`float64`

2D x 2D shape: (81, 81) x (81, 81) dtype:`float64`

2D x 2D shape: (243, 243) x (243, 243) dtype:`float64`

2D x 2D shape: (729, 729) x (729, 729) dtype:`float64`

ThreeMonth03 May 22, 2026 •

edited

Loading

ThreeMonth03 May 24, 2026 •

edited

Loading

ThreeMonth03 commented May 22, 2026 •

edited

Loading