Recently I have been look on the Ogre Matrix class which has a fairly un-optimized, but straightforward implementation, that you can see here.
I was wondering how it compares.
Of course somebody had a similar question in mind before. Martin Foot that is. While the discussion still applies today, I felt like the results could have changed since 2012 as libraries and compilers have moved on.
So I forked his code to update the libs to the latest versions and came up with the following results:
|Library||add (x86_64, SSSE3)||mult (x86_64, SSSE3)||add (armeabi-v7a, NEON)||mult (armeabi-v7a, NEON)|
|Eigen3||17 ms||53 ms||173 ms||399 ms|
|GLM||50 ms||186 ms||232 ms||399 ms|
|Ogre||50 ms||184 ms||232 ms||399 ms|
|CML1||116 ms||348 ms||178 ms||489 ms|
The used compiler was gcc with optimization level -O2.
As we can see Eigen3 just downgrades the rest on x86_64 – probably due its explicit vectorization. Notably, CLM1 is having some issues and even falls behind the naive implementations.
On ARM the results are more tight. With Eigen3 and CLM1 being about 25% faster at addition. However CML1 again has some issues with the mult test.
We end up with Eigen3 being the overall winner and GLM being second (Ogre does not count as it is not a Math library).
Also you should migrate away from CLM1 as the development focus shifted to CLM2 and the issues found above are probably not going to be resolved.